Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
От | Tomas Vondra |
---|---|
Тема | Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639) |
Дата | |
Msg-id | 07301efe-2f6f-145c-dbfd-20991c89f586@enterprisedb.com обсуждение исходный текст |
Ответ на | RE: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639) ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Список | pgsql-hackers |
On 6/14/23 15:39, Zhijie Hou (Fujitsu) wrote: > On Wednesday, June 14, 2023 5:05 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> ... >> >> Also, can you try if we still stream the partial transaction with >> create_logical_replication_slot building a full snapshot? > > Yes, It can fix this problem because force create_logical_replication_slot > build a full snapshot can avoid restoring the snapshot. But I am not sure if > this is the best fix for this for the same reason(it's costly) mentioned by > Amit[1]. > Costly compared to the current behavior? Sure, but considering the current behavior is incorrect/broken, that seems rather irrelevant. Incorrect behavior can be infinitely faster. I doubt it's significantly costlier than just setting the "full snapshot" flag when building the initial snapshot - sure, it will take more time than now, but that's kinda the whole point. It seems to me the problem is exactly that it *doesn't* wait long enough. I may be misunderstanding the solution you proposed, but this: One idea to fix the partial change stream problem would be that we record all the running transaction's xid when restoring the snapshot in SnapBuildFindSnapshot(), and in the following decoding, we skip decoding changes for the recorded transaction sounds pretty close to what building a correct snapshot actually does. But maybe I'm wrong - ultimately, the way to compare those approaches seems to be to prototype this proposal, and do some tests. There's also the question of back branches, and it seems way simpler to just flip a flag and disable broken optimization than doing fairly complex stuff to save it. I'd also point out that (a) this only affects the initial snapshot, not every time we start the decoding context, and (b) the slots created from walsender already do that with (unless when copy_data=false). So if needs_full_snapshot=true fixes the issue, I'd just do that as the first step - in master and backpatches. And then if we want to salvage the optimization, we can try fixing it in master. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: