RE: Newly created replication slot may be invalidated by checkpoint
| От | Zhijie Hou (Fujitsu) |
|---|---|
| Тема | RE: Newly created replication slot may be invalidated by checkpoint |
| Дата | |
| Msg-id | TY4PR01MB16907DCA80DBC3E77CE6B203294C9A@TY4PR01MB16907.jpnprd01.prod.outlook.com обсуждение исходный текст |
| Ответ на | Re: Newly created replication slot may be invalidated by checkpoint ("Vitaly Davydov" <v.davydov@postgrespro.ru>) |
| Список | pgsql-hackers |
On Friday, November 14, 2025 3:58 AM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
> Re-sending the original message due to a fail...
>
> Hi Zhijie Hou, Amit
>
> > > I think the main issue here lies in the possibility that the minimum
> restart_lsn
> > > obtained during a checkpoint could be less than the WAL position that is
> being
> > > reserved concurrently. So, instead of serializing the redo assignment and
> WAL
> > > reservation, Amit proposes serializing the CheckPointReplicationSlots() and
> WAL
> > > reservation. This would ensure the following:
> > >
> > > 1) If the WAL reservation occurs first, the checkpoint must wait for the
> > > restart_lsn to be updated before proceeding with WAL removal. This
> guarantees
> > > that the most recent restart_lsn position is detected.
> > >
> > > 2) If the checkpoint calls CheckPointReplicationSlots() first, then any
> > > subsequent WAL reservation must take a position later than the redo
> pointer.
>
> Thank you for the explanation. I agree, the Amit's patch solves the problem
> and it is the most promising solution. It is less risky to new bugs and there
> is no need to avoid locks for a maximum possible performance. I tried to find
> some corner cases but I failed.
...
>
> When investigating the solution I come to some questions. Below I shared
> them.
> I do not ask for an answer but I think, they may be considered when preparing
> the patch.
Thanks for raising two good questions.
>
> 1) Looking at ReplicationSlotReserveWal, I'm not sure why we assign
> restart_lsn
> in a loop and exit the loop if XLogGetLastRemovedSegno() < segno is true. I
> can
> understand if we compare restart_lsn with XLogCtl->replicationSlotMinLSN
> to handle parallel calls of ReplicationSlotsComputeRequiredLSN as described
> in the comment. The old segments removal happens much later in the
> checkpointer.
> I'm not sure, whether the comment describes the case inproperly or the code
> is
> wrong.
After considering more, I think we no longer need this check and the loop in
ReplicationSlotReserveWal() after fixing all race conditions that could cause the
WALs being reserved to be removed. (Since we have ensured that the
checkpoint cannot remove WALs being reserved by the newly created slot.)
> 2) Why we call XLogSetReplicationSlotMinimumLSN out of the control lock
> section.
> It may lead to some race conditions when two more backends create,
> advance or
> drop slots in parallel. Not sure, the control and allocation locks properly
> serialise the updates.
I researched this question and found that there is indeed another race condition here
that could cause the newly created slot to be invalidated. The scenario is that if
a backend end is advaning the restart_lsn of a slot and has reached
XLogSetReplicationSlotMinimumLSN but not update the minimum LSN yet, while
another backend created a new slot, then the minimum LSN could be set to more
recent value than the WAL position reserved by the newly created slot, causing
the new slot to be invalidated in next checkpoint.
The steps to reproduce is as follows:
1. Create a slot 'advtest' for later advancement.
select pg_create_logical_replication_slot('advtest', 'test_decoding');
2. Start a backend to create a slot (s) but block it before updating the
restart_lsn in ReplicationSlotReserveWal().
select pg_create_logical_replication_slot('s', 'test_decoding');
3. start another backend to generate some new WAL files and advance the
slot (advtest) to the latest position but block it from updating the LSN in
XLogSetReplicationSlotMinimumLSN()
select pg_switch_wal();
select pg_switch_wal();
SELECT pg_log_standby_snapshot();
SELECT pg_log_standby_snapshot();
select pg_replication_slot_advance('advtest', pg_current_wal_lsn());
select pg_replication_slot_advance('advtest', pg_current_wal_lsn());
4. Release the backend to create slot (s).
5. execute checkpoint but block it before calling XLogGetReplicationSlotMinimumLSN()
6. release the advancement backend and then the LSN will be set to a new position.
7. release the checkpoint and the WALs required by the slot (s) are removed.
This is similar to the concurrent slot_xmin update issue in another thread
(Assertion failure in SnapBuildInitialSnapshot()[1]), I think it's better to use
the same solution in both issue, e.g., we can increase the lock level of
ReplicationSlotControlLock to execlusive in ReplicationSlotsComputeRequiredLSN()
to block concurrnet LSN update.
Attach v2 patch that improves the fix based on above analysis.
(I am still testing some other scenarios related to slotsync to ensure the
current fix is complete)
[1]
https://www.postgresql.org/message-id/flat/CAA4eK1KcA%3DDrC3NTifig-x5DPXaxDEMLSZSz9gWS16m_d-%2B6rQ%40mail.gmail.com#8b8eeb63923e4ddd42de5c52e2ad43e7
Best Regards,
Hou zj
Вложения
В списке pgsql-hackers по дате отправления: