Re: Incorrect snapshots while promoting hot standby node when 2PC is used
От | Andres Freund |
---|---|
Тема | Re: Incorrect snapshots while promoting hot standby node when 2PC is used |
Дата | |
Msg-id | 20210504171337.o2fathpgatalkvm2@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Incorrect snapshots while promoting hot standby node when 2PC is used (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Hi, On 2021-05-04 12:32:34 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Michael Paquier (running locally I think), and subsequently Thomas Munro > > (noticing [1]), privately reported that they noticed an assertion failure in > > GetSnapshotData(). Both reasonably were wondering if that's related to the > > snapshot scalability patches. > > Michael reported the following assertion failure in 023_pitr_prepared_xact.pl: > >> TRAP: FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", File: "procarray.c", Line: 2468,PID: 22901) > > mantid just showed a failure that looks like the same thing, at > least it's also in 023_pitr_prepared_xact.pl: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mantid&dt=2021-05-03%2013%3A07%3A06 > > The assertion line number is rather different though: > > TRAP: FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", File: "procarray.c", Line: 2094, PID:1163004) I managed to hit that one as well and it's also what fairywren hit - the assertion in 2094 and 2468 are basically copies of the same check, and which one hit is a question of timing. > and interestingly, this happened in a parallel worker: I think the issue can be hit (or rather detected) whenever a transaction builds one snapshot while in recovery, and a second one during end-of-recovery. The parallel query here is just 2021-05-03 09:18:35.602 EDT [1162987:6] DETAIL: Failed process was running: SELECT pg_is_in_recovery() = 'f'; (parallel due to force_parallel_mode) - which of course is likely to run during end-of-recovery So it does seem like the same bug of resetting the KnownAssignedXids stuff too early. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: