Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
От | Peter Smith |
---|---|
Тема | Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop |
Дата | |
Msg-id | CAHut+Pt4PyKQCwqzQ=EFF=bpKKJD7XKt_S23F6L20ayQNxg77A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Ответы |
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
|
Список | pgsql-bugs |
On Wed, Nov 18, 2020 at 1:29 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2020-Nov-04, Amit Kapila wrote: > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > * STREAM COMMIT bug? > > > In apply_handle_stream_commit, we do CommitTransactionCommand, but > > > apparently in a tablesync worker we shouldn't do it. > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup > > where we disable streaming for the init phase. As far as I understand, > > for tablesync we create the initial slot during which streaming will > > be disabled then we will copy the table (here logical decoding won't > > be used) and then allow the apply worker to get any other data which > > is inserted in the meantime. Now, I might be missing something here > > but if you can explain it a bit more or share some test to show how we > > can reach here via tablesync worker then we can discuss the possible > > solution. > > Hmm, okay, that sounds like there would be no bug then. Maybe what we > need is just an assert in apply_handle_stream_commit that > !am_tablesync_worker(), as in the attached patch. Passes tests. Hi. Using the same debugging technique described in a previous mail [1], I have tested again but this time using a SUBSCRIPTION capable of streaming. While paused in the debugger (to force an unusual timing situation) I can publish INSERTs en masse and cause streaming replication to occur. To cut a long story short, a tablesync worker CAN in fact end up processing (e.g. apply_dispatch) streaming messages. So the tablesync worker CAN get into the apply_handle_stream_commit. And this scenario, albeit rare, will crash. For example, --- Program received signal SIGABRT, Aborted. 0x00007f37570f22c7 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007f37570f22c7 in raise () from /lib64/libc.so.6 #1 0x00007f37570f39b8 in abort () from /lib64/libc.so.6 #2 0x0000000000f2bf6a in ExceptionalCondition (conditionName=0x11330a4 "IsTransactionState()", errorType=0x1133094 "FailedAssertion", fileName=0x1133088 "tablesync.c", lineNumber=273) at assert.c:69 #3 0x0000000000bb046e in process_syncing_tables_for_sync (current_lsn=23984432) at tablesync.c:273 #4 0x0000000000bb1039 in process_syncing_tables (current_lsn=23984432) at tablesync.c:535 #5 0x0000000000bb5c7a in apply_handle_stream_commit (s=0x7ffd2d892fd0) at worker.c:1080 #6 0x0000000000bb8edb in apply_dispatch (s=0x7ffd2d892fd0) at worker.c:1966 #7 0x0000000000bb97c9 in LogicalRepApplyLoop (last_received=23984432) at worker.c:2166 #8 0x0000000000bbca84 in ApplyWorkerMain (main_arg=1) at worker.c:3122 #9 0x0000000000b36f11 in StartBackgroundWorker () at bgworker.c:820 #10 0x0000000000b5cf3a in do_start_bgworker (rw=0x1f3da50) at postmaster.c:5837 #11 0x0000000000b5d661 in maybe_start_bgworkers () at postmaster.c:6062 #12 0x0000000000b5b560 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5223 #13 <signal handler called> #14 0x00007f37571b10d3 in __select_nocancel () from /lib64/libc.so.6 #15 0x0000000000b52a40 in ServerLoop () at postmaster.c:1691 #16 0x0000000000b51fa8 in PostmasterMain (argc=3, argv=0x1f164f0) at postmaster.c:1400 #17 0x00000000009ab55d in main (argc=3, argv=0x1f164f0) at main.c:209 --- Also, PSA v2 of my "helper" patch for debugging the tablesync worker. (This patch has more logging than v1) --- [1] - https://www.postgresql.org/message-id/CAHut%2BPsprtsa4o89wtNnKLxxwXeDKAX9nNsdghT1Pv63siz%2BAA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Вложения
В списке pgsql-bugs по дате отправления: