Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
От | Alexander Lakhin |
---|---|
Тема | Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed |
Дата | |
Msg-id | 5d0e5bec-67f9-9164-36cb-c4ff5f95d1ed@gmail.com обсуждение исходный текст |
Ответ на | Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Ответы |
Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
(Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
|
Список | pgsql-hackers |
Hi Bertrand, 10.01.2024 19:32, Bertrand Drouvot wrote: > >> is an absent message 'obsolete replication slot "row_removal_activeslot"' > Looking at recovery-15-vacuum_pg_class/i035_standby_logical_decoding_standby.log here: > > Yeah, the missing message has to come from InvalidatePossiblyObsoleteSlot(). > > In case of an active slot we first call ReportSlotInvalidation with the second > parameter set to true (to emit the "terminating" message), then SIGTERM the active > process and then (later) we should call the other ReportSlotInvalidation() > call with the second parameter set to false (to emit the message that we don't > see here). > > So it means InvalidatePossiblyObsoleteSlot() did not trigger the second ReportSlotInvalidation() > call. I've found a way to reproduce the issue without slowing down a machine or running multiple tests in parallel. It's enough for this to add a delay to allow BackgroundWriterMain() to execute LogStandbySnapshot(): @@ -692,2 +690,3 @@ $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'D';]); +$node_primary->safe_psql('testdb', qq[SELECT pg_sleep(15);]); $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'E';]); With this delay, I get the failure immediately: $ PROVE_TESTS="t/035*" TEMP_CONFIG=/tmp/extra.config make check -s -C src/test/recovery # +++ tap check in src/test/recovery +++ t/035_standby_logical_decoding.pl .. 47/? # Failed test 'activeslot slot invalidation is logged with on-access pruning' # at t/035_standby_logical_decoding.pl line 227. _primary.log contains: 2024-01-11 09:37:01.731 UTC [67656] 035_standby_logical_decoding.pl STATEMENT: UPDATE prun SET s = 'D'; 2024-01-11 09:37:01.738 UTC [67664] 035_standby_logical_decoding.pl LOG: statement: SELECT pg_sleep(15); 2024-01-11 09:37:01.738 UTC [67664] 035_standby_logical_decoding.pl LOG: xlog flush request 0/404D8F0; write 0/0; flush 0/0 at character 8 2024-01-11 09:37:01.738 UTC [67664] 035_standby_logical_decoding.pl CONTEXT: writing block 14 of relation base/16384/1247 2024-01-11 09:37:01.738 UTC [67664] 035_standby_logical_decoding.pl STATEMENT: SELECT pg_sleep(15); 2024-01-11 09:37:01.905 UTC [67204] LOG: xlog flush request 0/404DA58; write 0/404BB00; flush 0/404BB00 2024-01-11 09:37:01.905 UTC [67204] CONTEXT: writing block 4 of relation base/16384/2673 2024-01-11 09:37:12.514 UTC [67204] LOG: INSERT @ 0/4057768: - Standby/RUNNING_XACTS: nextXid 769 latestCompletedXid 768 oldestRunningXid 769 2024-01-11 09:37:12.514 UTC [67206] LOG: xlog bg flush request write 0/4057768; flush: 0/4057768, current is write 0/4057730; flush 0/4057730 2024-01-11 09:37:16.760 UTC [67712] 035_standby_logical_decoding.pl LOG: statement: UPDATE prun SET s = 'E'; 2024-01-11 09:37:16.760 UTC [67712] 035_standby_logical_decoding.pl LOG: INSERT @ 0/40577A8: - Heap2/PRUNE: snapshotConflictHorizon: 768,... Note RUNNING_XACTS here... _standby.log contains: 2024-01-11 09:37:16.842 UTC [67606] LOG: invalidating obsolete replication slot "pruning_inactiveslot" 2024-01-11 09:37:16.842 UTC [67606] DETAIL: The slot conflicted with xid horizon 768. 2024-01-11 09:37:16.842 UTC [67606] CONTEXT: WAL redo at 0/4057768 for Heap2/PRUNE: snapshotConflictHorizon: 768, ... and no 'invalidating obsolete replication slot "pruning_activeslot"' below. Debug logging added (see attached) gives more information: 2024-01-11 09:37:16.842 UTC [67606] LOG: invalidating obsolete replication slot "pruning_inactiveslot" 2024-01-11 09:37:16.842 UTC [67606] DETAIL: The slot conflicted with xid horizon 768. ... 2024-01-11 09:37:16.842 UTC [67606] LOG: !!!InvalidatePossiblyObsoleteSlot| RS_INVAL_HORIZON, s: 0x7f7985475c10, s->effective_xmin: 0, s->effective_catalog_xmin: 769, snapshotConflictHorizon: 768 ... 2024-01-11 09:37:16.842 UTC [67606] LOG: !!!InvalidatePossiblyObsoleteSlot| conflict: 0 so the condition TransactionIdPrecedesOrEquals(s->effective_catalog_xmin, snapshotConflictHorizon) is not satisfied, hence conflict = 0 and it breaks the loop in InvalidatePossiblyObsoleteSlot(). Several lines above in the log we can see: 2024-01-11 09:37:12.514 UTC [67606] LOG: REDO @ 0/4057730; LSN 0/4057768: prev 0/4057700; xid 0; len 24 - Standby/RUNNING_XACTS: nextXid 769 latestCompletedXid 768 oldestRunningXid 769 2024-01-11 09:37:12.540 UTC [67643] 035_standby_logical_decoding.pl LOG: !!!LogicalConfirmReceivedLocation| MyReplicationSlot: 0x7f7985475c10, MyReplicationSlot->effective_catalog_xmin: 769 and that's the first occurrence of xid 769 in the log. The decoded stack trace for the LogicalConfirmReceivedLocation call is: ogicalConfirmReceivedLocation at logical.c:1886:1 ProcessStandbyReplyMessage at walsender.c:2327:1 ProcessStandbyMessage at walsender.c:2188:1 ProcessRepliesIfAny at walsender.c:2121:5 WalSndWaitForWal at walsender.c:1735:7 logical_read_xlog_page at walsender.c:1068:13 ReadPageInternal at xlogreader.c:1062:12 XLogDecodeNextRecord at xlogreader.c:601:5 XLogReadAhead at xlogreader.c:976:5 XLogReadRecord at xlogreader.c:406:3 XLogSendLogical at walsender.c:3229:5 WalSndLoop at walsender.c:2658:7 StartLogicalReplication at walsender.c:1477:2 exec_replication_command at walsender.c:1985:6 PostgresMain at postgres.c:4649:10 Best regards, Alexander
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Will MortensenДата:
Сообщение: Re: Exposing the lock manager's WaitForLockers() to SQL