Re:RE: Re:RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()
От | ocean_li_996 |
---|---|
Тема | Re:RE: Re:RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder() |
Дата | |
Msg-id | 5f2487c9.13435.18e2e656942.Coremail.ocean_li_996@163.com обсуждение исходный текст |
Ответ на | RE: Re:RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder() ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Ответы |
RE: Re:RE: Re:RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()
|
Список | pgsql-bugs |
Dear Hayato:
At 2024-03-08 15:35:16, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote: >Dear Haiyang, Alexander, > >Below part describes the second failure. The issue would be occurred when: > >1) Logical decoding starts from the middle of a sub-transaction, and >2) NEW_CID record is initially decoded in the sub-transaction, and >3) An arbitrary changes are decoded in the sub-transaction.>...## Issue 1Thanks for your reproducer. I have investigated this issue. The scenario that caused the issueis indeed as you described above. I had not realized before that a serialized snapshot file fromone replication slot could be utilized by another replication slot to achieve a consistent state.This issue would not arise with only one replication slot, as SnapBuildXactNeedsSkip ensures that.Using the spec test case you provided (without any fix patch), I got another Stuck trace on REL_14_STABLE :```#0 0x00007f93604e0277 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f93604e1968 in __GI_abort () at abort.c:90 #2 0x0000000000b084da in ExceptionalCondition (conditionName=0xcc94c8 "prev_first_lsn < cur_txn->first_lsn", errorType=0xcc92c4 "FailedAssertion", fileName=0xcc9310 "reorderbuffer.c", lineNumber=916) at assert.c:69 #3 0x00000000008df064 in AssertTXNLsnOrder (rb=0x2a48590) at reorderbuffer.c:916 #4 0x00000000008de9a1 in ReorderBufferTXNByXid (rb=0x2a48590, xid=756, create=true, is_new=0x0, lsn=24749416, create_as_top=true) at reorderbuffer.c:669 #5 0x00000000008e2d00 in ReorderBufferAddNewTupleCids (rb=0x2a48590, xid=756, lsn=24749416, node=..., tid=..., cmin=1, cmax=4294967295, combocid=4294967295) at reorderbuffer.c:3198 #6 0x00000000008e7ed7 in SnapBuildProcessNewCid (builder=0x2a44570, xid=757, lsn=24749416, xlrec=0x2a16d38) at snapbuild.c:823 #7 0x00000000008d14b3 in DecodeHeap2Op (ctx=0x2a324f0, buf=0x7ffd886a4bc0) at decode.c:471 #8 0x00000000008d0bfa in LogicalDecodingProcessRecord (ctx=0x2a324f0, record=0x2a328f0) at decode.c:151 #9 0x00000000008d8688 in pg_logical_slot_get_changes_guts (fcinfo=0x2a206b0, confirm=true, binary=false) at logicalfuncs.c:296 #10 0x00000000008d87bd in pg_logical_slot_get_changes (fcinfo=0x2a206b0) at logicalfuncs.c:365```Then, I think the simplest process would be:```ReorderBufferProcessXid(xid = 757, lsn) -> A ReorderBufferTXN for subtransaciton (757) was generated. The first_lsn was the head of NEW_CID record. DecodeHeap2Op(info = XLOG_HEAP2_NEW_CID) SnapBuildProcessNewCid() ReorderBufferAddNewTupleCids(..., xlrec->top_xid = 756,...) ReorderBufferTXNByXid(xid = 756, lsn) -> A ReorderBufferTXN for top-transaciton (756) was generated. The first_lsn was the head of NEW_CID record. It caused an Assertion failure.```> I think the straightforward fix is to associate them to top-sub relationship, > and attached patch did it.LGFM, I think it is suitable assign subtransaction after calling ReorderBufferAddNewTupleCids().## Issue 2Inspired by your spec case, I've reorganized the spec case provided in [2]. The new test in attachmentis able to reproduce the issue mentioned in [1] even before commit 6b77048e5.The approach in [3] is also LGFM.
[1] https://www.postgresql.org/message-id/18369-ad61699bf91c5bc0@postgresql.org
[2] https://www.postgresql.org/message-id/6d0e80d6.c1fc.18deeb8120a.Coremail.ocean_li_996@163.com
[3] https://www.postgresql.org/message-id/CAA4eK1KpW5pHMwMp9hfXYvOeEU5Rcbhoc7FxtBOGPgKeyYLDmA@mail.gmail.com
Best Regards,Haiyang Li
Вложения
В списке pgsql-bugs по дате отправления: