RE: Newly created replication slot may be invalidated by checkpoint
| От | Vitaly Davydov | 
|---|---|
| Тема | RE: Newly created replication slot may be invalidated by checkpoint | 
| Дата | |
| Msg-id | 30938d-68ffa500-b-6328cc00@139466859 обсуждение исходный текст  | 
		
| Ответ на | RE: Newly created replication slot may be invalidated by checkpoint ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) | 
| Ответы | 
                	
            		RE: Newly created replication slot may be invalidated by  checkpoint
            		
            		 | 
		
| Список | pgsql-hackers | 
Dear Hayato, On Tuesday, October 07, 2025 14:53 MSK, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote: > > Yes, I > > propose > > to ensure that the protection happens when we assign restart_lsn. It seems to be > > wrong that we invalidate slots by its restart_lsn but protect the wal for > > slots using XLogCtl->replicationSlotsMinLSN. > > Seems valid. There is another corner case that another restart_lsn can be set in-between, > but they have larger LSN than RedoRecPtr, right? Here we talk about new creating slots. There should be no other processes that can change restart_lsn during slot creation. Once, the slot is successfully created it can be advanced to the greater values only. During advance, the old restart_lsn will protect the slot, because it will be taken into account in the checkpoint. > > There is one subtle thing. Once, the operation of restart_lsn assignment is not > > an atomic, the following scenario may happen theoretically: > > 1. Read GetRedoRecPtr() in the backend (ReplicationSlotReserveWal) > > 2. Assign a new redo LSN in the checkpointer > > 3. Call ReplicationSlotsComputeRequiredLSN() in the checkpointer > > 3. Assign the old redo LSN to restart_lsn > > > > In this scenario, the restart_lsn will point to a previous redo LSN and it will > > be not protected by the new redo LSN. This scenario is unlikely, but it can > > happen theoretically. I have no ideas how to deal with it, except of assigning > > restart_lsn under XLogCtl->info_lck lock to avoid concurrent modification of > > XLogCtl->RecoRecPtr until it is assigned to restart_lsn of a creating slot. > > Oh, your point is there is another race condition, right? Do you have the reproducer for it? The attached test for the master branch demonstrates a possible but very rare race condition, because we read and assign slot's restart_lsn from redo rec ptr non-atomically. The proposed scenario (see above) seems to be not complete. With best regards, Vitaly
Вложения
В списке pgsql-hackers по дате отправления: