Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
От | Alexander Korotkov |
---|---|
Тема | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Дата | |
Msg-id | CAPpHfds0ScFbExsT+pnY7K0JGH6pFPHMOSfCOknC-1s7ahmRFA@mail.gmail.com обсуждение исходный текст |
Ответ на | Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly ("Vitaly Davydov" <v.davydov@postgrespro.ru>) |
Список | pgsql-hackers |
Hi, Vitaly! On Tue, Jun 17, 2025 at 6:02 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote: > Thank you for reporting the issue. > > >While tracking buildfarm for one of other commits, I noticed this failure: > >TRAP: failed Assert("s->data.restart_lsn >= > >s->last_saved_restart_lsn"), File: > >"../pgsql/src/backend/replication/slot.c", Line: 1813, PID: 3945797 > >postgres: standby: checkpointer (ExceptionalCondition+0x83) [0x55fa69b79f5e] > >postgres: standby: checkpointer > >(InvalidateObsoleteReplicationSlots+0x53c) [0x55fa69982171] > >postgres: standby: checkpointer (CreateCheckPoint+0x9ad) [0x55fa6971feb2] > > This assert was introduced in the patch. Now, I think, it is a wrong one. Let me > please explain one of the possible scenarios when it can be triggered. In case > of physical replication, when walsender receives a standby reply message, it > calls PhysicalConfirmReceivedLocation function which updates slots' restart_lsn > from received flush_lsn value. This value may be older than the saved value. If > it happens during checkpoint, after slot saving to disk, this assert will be > triggered, because last_saved_restart_lsn value may be lesser than the new > restart_lsn value, updated from walsender. > > I propose to remove this assert. Yes, but is it OK for restart_lsn to move backward? That might mean that if checkpoint happen faster than PhysicalConfirmReceivedLocation(), then crash could cause this WAL location to be unavailable. Is that true? Also, what do you think about proposed changes in [1]? I wonder if it could somehow decrease the coverage. Links. 1. https://www.postgresql.org/message-id/OSCPR01MB149665B3F0629D10731B18E5AF570A%40OSCPR01MB14966.jpnprd01.prod.outlook.com ------ Regards, Alexander Korotkov Supabase
В списке pgsql-hackers по дате отправления: