Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
| От | shveta malik |
|---|---|
| Тема | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Дата | |
| Msg-id | CAJpy0uCn6A4X4Us5QrPXbugezLf5ZzoanusxufOsEnVMC0YT2Q@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Masahiko Sawada <sawada.mshk@gmail.com>) |
| Ответы |
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
|
| Список | pgsql-hackers |
On Sat, Dec 13, 2025 at 2:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Dec 10, 2025 at 10:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 10, 2025 at 3:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > +1. This can be reproduced as well. When the logical-decoding state is
> > > cached, we may fail to log logical-info (unassigned XID case), causing
> > > certain rows not to be replicated to subscribers. The steps below
> > > demonstrate this.
> > >
> > > Backend1 of pub:
> > > -------------
> > > create table tab1(i int);
> > > create publication pub1 for table tab1;
> > >
> > > BEGIN;
> > > SELECT txid_current_if_assigned(); --xid not assigned yet.
> > > SHOW wal_level; SHOW effective_wal_level; --replica
> > >
> > > --pause here and do 'Step1' mentioned below on backend2.
> > > --logical decoding is now enabled except this backend.
> > > --now continue with backend1:
> > >
> > > insert into tab1 values(20);
> > > insert into tab1 values(30);
> > >
> > > --pause here and do 'Step2' mentioned below on backend2.
> > > --now continue with backend1:
> > >
> > > SELECT txid_current_if_assigned(); --xid gets assigned before above insert.
> > > SHOW wal_level; SHOW effective_wal_level; --it is still 'replica' in this txn.
> > > COMMIT;
> > >
> > > Step1 (it will enable logical decoding):
> > > ----------------------------
> > > Backend2 of pub:
> > > SELECT pg_create_logical_replication_slot('slot', 'pgoutput', false,
> > > false, false);
> > > show wal_level; show effective_wal_level; --logical now.
> > >
> > > Subscriber:
> > > create table tab1(i int);
> > > create subscription sub1 connection '...' publication pub1;
> > >
> > > Backend2 of pub: insert into tab1 values(10);
> > > ----------------------------
> > >
> > >
> > > Step2:
> > > --------------------------------
> > > Backend2 of pub: insert into tab1 values(40);
> > > --------------------------------
> > >
> > > At the end after backend1 commits:
> > > On pub, we have 4 rows in tab1:
> > > {10}, {20}, {30}, {40}
> > >
> > > On sub, we have 2 rows in tab1:
> > > {10}, {40}
> > >
> > > ~~
> > >
> > > If we stop caching the logical-decoding state within a transaction, we
> > > may still encounter issues, because the backend could observe logical
> > > decoding as disabled at one point and enabled at another.
> > >
> >
> > I think such a problem won't happen at transaction-level if we ensure
> > that transaction-level cache is initialized at the time of
> > transaction-id assignment.
>
> Right. It seems reasonable to me. Transactional operations can
> consistently write logical-level or replica-level WAL records whereas
> non-transactional operations (such as VACUUM) immediately change its
> effective_wal_level.
>
> The transaction-level cache is aimed to prevent the issue like we had
> in ExecuteTruncate() and ExecuteTruncateGuts(). I guess it's quite
> confusing if XLogStandbyInfoActive() and XLogLogicalInfoActive()
> behave differently, so I think we need the transaction-level cache.
>
> > However, if we want to wait for all
> > backends that have any open transaction during first logical
> > slot-creation then this should be addressed automatically.
>
> Right.
>
> > And, we
> > don't need to worry about the theoretical scenario where half the WAL
> > info is constructed before tranasaction_id assignment and the other
> > half after assignment. I feel waiting for all open transactions idea
> > sounds like we are going too far without the real need.
> >
> > Having said that, if we still want to go with waiting for all open
> > transactions idea then let's document it along with logical slot
> > creation documentation.
>
> With the above transaction-level cache, logical decoding ends up
> processing non-logical-level WAL when:
>
> (1) operations decide not to include logical-info to WAL records
> before getting an XID.
>
> (2) operations that don't require XID assignment started when
> effective_wal_level was 'replica'.
>
> For (1), I imagine the following scenario for example:
>
> xl_xxx xlrec;
>
> if (XLogLogicalInfoActive())
> xlrec.flags |= LOGICAL_INFO_1;
>
> GetTopTransactionId();
>
> if (XLogLogicalInfoActive())
> xlrec.flags |= LOGICAL_INFO_2;
>
> XLogBeginInsert();
> XLogRegisterData(&xlrec, sizeof(xlrec));
> XLogInsert(XXX, YYY);
>
> if a logical decoding starts before the XID assignment, it would
> decode the WAL record, but which logical-info is included into the WAL
> record depends on when the process absorbs the cache update signal
> barrier. If the signal is processed between setting LOGICAL_INFO_1 and
> the XID assignment, the WAL record would have only LOGICAL_INFO_2. I
> guess such coding practice is uncommon and I don't see it in the
> existing codes.
>
> For (2), operations within a transaction that don't require the XID
> assignment would write WAL records at 'replica' level or the mixed of
> 'replica' and 'logical' levels, depending on when it processes the
> cache update singal. I searched[1] what kinds of rmgr don't require
> the XID assignment (info is a hex number returned by
> XLogRecGetInfo()):
>
> BRIN info: 10, 20, 30, 40, 90, A0
> BTEE info: 80, 90, B0, C0, E0
> DBASE info: 00, 20
> GIN info: 10, 20, 30, 60, 80, 90
> GIST info: 00
> HASH info: 70, 80, 90, B0
> HEAP info: 70
> HEAP2 info: 10, 20, 30, 40
> LOGICALMSG info: 00
> REPLORIGIN info: 10
> SMGR info: 10
> SPG info: 60, 80
> STANDBY info: 00, 10, 20
> XACT info: 00, 10, 20, 30, 40, 60
> XLOG info: 00, 10, 30, 40, 50, 60, 70, 90, A0, B0, D0, E0, F0
>
> As far as I research, there is no problem in terms of logical decoding
> even if we process these WAL records generated at non-logical level
> during logical decoding.
>
> I think we can go without waiting. It would be great if we could have
> checks or assertions to detect such scenarios.
>
I tried to come up with suitable Assert()s, but couldn’t identify any.
Adding assertions for these scenarios (where the logical decoding
value is read without an XID assigned) could lead to false failures,
even in cases that aren’t suitable for assertion. Or let me know if
you have better ideas here?
IMO, at best we can document these scenarios as comments in
logicalctl.c’s header, so that new readers are aware of them.
> I've updated the patch accordingly.
>
Thanks. Verified the scenarios around changed logic, they work well.
Few trivial comments:
1)
DisableLogicalDecoding():
+ * Skip CheckLogicalSlotExists() check during recovery because the
+ * existing slots will be invalidated after disabling logical decoding.
+ */
+ if (!in_recovery &&
+ (!LogicalDecodingCtl->logical_decoding_enabled || CheckLogicalSlotExists()))
+ {
+ LogicalDecodingCtl->pending_disable = false;
+ LWLockRelease(LogicalDecodingControlLock);
+ return;
+ }
Shall the condition be:
!LogicalDecodingCtl->logical_decoding_enabled || (!in_recovery &&
CheckLogicalSlotExists())
2)
a) Earlier when startup was enabling or disabling logical-decoding on
standby during STATUS_CHANGE, it was putting a log:
elog(DEBUG1, "update logical decoding status to %d", new_status);
Now there is no such log. Shall we have it in both
EnableLogicalDecoding() and DisableLogicalDecoding() for
'in_recovery'? I feel it might be helpful in diagnosis of standby
cases.
b) Same is true for UpdateLogicalDecodingStatusEndOfRecovery() which
was earlier using UpdateLogicalDecodingStatus() which had a log.
thanks
Shveta
В списке pgsql-hackers по дате отправления: