Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
| От | Masahiko Sawada |
|---|---|
| Тема | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Дата | |
| Msg-id | CAD21AoDXxhMYWQw0FpJ4fmowanAk1vv-5aTUN7NvBaG5AV_6fA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (shveta malik <shveta.malik@gmail.com>) |
| Список | pgsql-hackers |
On Mon, Dec 15, 2025 at 1:11 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Sat, Dec 13, 2025 at 2:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Dec 10, 2025 at 10:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Dec 10, 2025 at 3:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > +1. This can be reproduced as well. When the logical-decoding state is
> > > > cached, we may fail to log logical-info (unassigned XID case), causing
> > > > certain rows not to be replicated to subscribers. The steps below
> > > > demonstrate this.
> > > >
> > > > Backend1 of pub:
> > > > -------------
> > > > create table tab1(i int);
> > > > create publication pub1 for table tab1;
> > > >
> > > > BEGIN;
> > > > SELECT txid_current_if_assigned(); --xid not assigned yet.
> > > > SHOW wal_level; SHOW effective_wal_level; --replica
> > > >
> > > > --pause here and do 'Step1' mentioned below on backend2.
> > > > --logical decoding is now enabled except this backend.
> > > > --now continue with backend1:
> > > >
> > > > insert into tab1 values(20);
> > > > insert into tab1 values(30);
> > > >
> > > > --pause here and do 'Step2' mentioned below on backend2.
> > > > --now continue with backend1:
> > > >
> > > > SELECT txid_current_if_assigned(); --xid gets assigned before above insert.
> > > > SHOW wal_level; SHOW effective_wal_level; --it is still 'replica' in this txn.
> > > > COMMIT;
> > > >
> > > > Step1 (it will enable logical decoding):
> > > > ----------------------------
> > > > Backend2 of pub:
> > > > SELECT pg_create_logical_replication_slot('slot', 'pgoutput', false,
> > > > false, false);
> > > > show wal_level; show effective_wal_level; --logical now.
> > > >
> > > > Subscriber:
> > > > create table tab1(i int);
> > > > create subscription sub1 connection '...' publication pub1;
> > > >
> > > > Backend2 of pub: insert into tab1 values(10);
> > > > ----------------------------
> > > >
> > > >
> > > > Step2:
> > > > --------------------------------
> > > > Backend2 of pub: insert into tab1 values(40);
> > > > --------------------------------
> > > >
> > > > At the end after backend1 commits:
> > > > On pub, we have 4 rows in tab1:
> > > > {10}, {20}, {30}, {40}
> > > >
> > > > On sub, we have 2 rows in tab1:
> > > > {10}, {40}
> > > >
> > > > ~~
> > > >
> > > > If we stop caching the logical-decoding state within a transaction, we
> > > > may still encounter issues, because the backend could observe logical
> > > > decoding as disabled at one point and enabled at another.
> > > >
> > >
> > > I think such a problem won't happen at transaction-level if we ensure
> > > that transaction-level cache is initialized at the time of
> > > transaction-id assignment.
> >
> > Right. It seems reasonable to me. Transactional operations can
> > consistently write logical-level or replica-level WAL records whereas
> > non-transactional operations (such as VACUUM) immediately change its
> > effective_wal_level.
> >
> > The transaction-level cache is aimed to prevent the issue like we had
> > in ExecuteTruncate() and ExecuteTruncateGuts(). I guess it's quite
> > confusing if XLogStandbyInfoActive() and XLogLogicalInfoActive()
> > behave differently, so I think we need the transaction-level cache.
> >
> > > However, if we want to wait for all
> > > backends that have any open transaction during first logical
> > > slot-creation then this should be addressed automatically.
> >
> > Right.
> >
> > > And, we
> > > don't need to worry about the theoretical scenario where half the WAL
> > > info is constructed before tranasaction_id assignment and the other
> > > half after assignment. I feel waiting for all open transactions idea
> > > sounds like we are going too far without the real need.
> > >
> > > Having said that, if we still want to go with waiting for all open
> > > transactions idea then let's document it along with logical slot
> > > creation documentation.
> >
> > With the above transaction-level cache, logical decoding ends up
> > processing non-logical-level WAL when:
> >
> > (1) operations decide not to include logical-info to WAL records
> > before getting an XID.
> >
> > (2) operations that don't require XID assignment started when
> > effective_wal_level was 'replica'.
> >
> > For (1), I imagine the following scenario for example:
> >
> > xl_xxx xlrec;
> >
> > if (XLogLogicalInfoActive())
> > xlrec.flags |= LOGICAL_INFO_1;
> >
> > GetTopTransactionId();
> >
> > if (XLogLogicalInfoActive())
> > xlrec.flags |= LOGICAL_INFO_2;
> >
> > XLogBeginInsert();
> > XLogRegisterData(&xlrec, sizeof(xlrec));
> > XLogInsert(XXX, YYY);
> >
> > if a logical decoding starts before the XID assignment, it would
> > decode the WAL record, but which logical-info is included into the WAL
> > record depends on when the process absorbs the cache update signal
> > barrier. If the signal is processed between setting LOGICAL_INFO_1 and
> > the XID assignment, the WAL record would have only LOGICAL_INFO_2. I
> > guess such coding practice is uncommon and I don't see it in the
> > existing codes.
> >
> > For (2), operations within a transaction that don't require the XID
> > assignment would write WAL records at 'replica' level or the mixed of
> > 'replica' and 'logical' levels, depending on when it processes the
> > cache update singal. I searched[1] what kinds of rmgr don't require
> > the XID assignment (info is a hex number returned by
> > XLogRecGetInfo()):
> >
> > BRIN info: 10, 20, 30, 40, 90, A0
> > BTEE info: 80, 90, B0, C0, E0
> > DBASE info: 00, 20
> > GIN info: 10, 20, 30, 60, 80, 90
> > GIST info: 00
> > HASH info: 70, 80, 90, B0
> > HEAP info: 70
> > HEAP2 info: 10, 20, 30, 40
> > LOGICALMSG info: 00
> > REPLORIGIN info: 10
> > SMGR info: 10
> > SPG info: 60, 80
> > STANDBY info: 00, 10, 20
> > XACT info: 00, 10, 20, 30, 40, 60
> > XLOG info: 00, 10, 30, 40, 50, 60, 70, 90, A0, B0, D0, E0, F0
> >
> > As far as I research, there is no problem in terms of logical decoding
> > even if we process these WAL records generated at non-logical level
> > during logical decoding.
> >
> > I think we can go without waiting. It would be great if we could have
> > checks or assertions to detect such scenarios.
> >
>
> I tried to come up with suitable Assert()s, but couldn’t identify any.
> Adding assertions for these scenarios (where the logical decoding
> value is read without an XID assigned) could lead to false failures,
> even in cases that aren’t suitable for assertion. Or let me know if
> you have better ideas here?
>
> IMO, at best we can document these scenarios as comments in
> logicalctl.c’s header, so that new readers are aware of them.
I don't have better ideas either.
>
> > I've updated the patch accordingly.
> >
>
> Thanks. Verified the scenarios around changed logic, they work well.
> Few trivial comments:
>
> 1)
> DisableLogicalDecoding():
>
> + * Skip CheckLogicalSlotExists() check during recovery because the
> + * existing slots will be invalidated after disabling logical decoding.
> + */
> + if (!in_recovery &&
> + (!LogicalDecodingCtl->logical_decoding_enabled || CheckLogicalSlotExists()))
> + {
> + LogicalDecodingCtl->pending_disable = false;
> + LWLockRelease(LogicalDecodingControlLock);
> + return;
> + }
>
> Shall the condition be:
> !LogicalDecodingCtl->logical_decoding_enabled || (!in_recovery &&
> CheckLogicalSlotExists())
Agreed.
>
> 2)
> a) Earlier when startup was enabling or disabling logical-decoding on
> standby during STATUS_CHANGE, it was putting a log:
> elog(DEBUG1, "update logical decoding status to %d", new_status);
>
> Now there is no such log. Shall we have it in both
> EnableLogicalDecoding() and DisableLogicalDecoding() for
> 'in_recovery'? I feel it might be helpful in diagnosis of standby
> cases.
>
> b) Same is true for UpdateLogicalDecodingStatusEndOfRecovery() which
> was earlier using UpdateLogicalDecodingStatus() which had a log.
>
Okay. I've added debug logs to these places.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: