Обсуждение: Newly created replication slot may be invalidated by checkpoint
Hi, all,
I'd like to discuss an issue about getting the minimal restart_lsn for WAL segments removal during checkpoint. The discussion [1] fixed the issue with the unexpected removal of old WAL segments after checkpoint, followed by an immediate restart. The commit 2090edc6f32f652a2c introduced a change that the minimal restart_lsn is obtained at the start of checkpoint creation. If a replication slot is created and performs a WAL reservation concurrently, the WAL segment contains the new slot's restart_lsn could be removed by the ongoing checkpoint. In the attached patch I add a perl test to reproduce this scenario.
Additionally, while studying the InvalidatePossiblyObsoleteSlot(), I noticed a behavioral difference between PG15 (and earlier) and PG16 (and later). In PG15 and earlier, while attempting to acquire a slot, if the slot's restart_lsn advanced to be greater than oldestLSN, the slot would not be marked invalid. Starting in PG16, whether a slot is marked invalid is determined solely based on initial_restart_lsn, even if the slot's restart_lsn advances above oldestLSN while waiting, the slot will still be marked invalid. The initial_restart_lsn is recorded to report the correct invalidation cause (see discussion [2]), but why not decide whether to mark the slot as invalid based on the slot's current restart_lsn? If a slot's restart_lsn has already advanced sufficiently, shouldn't we refrain from invalidating it?
[2]: https://www.postgresql.org/message-id/ZaTjW2Xh+TQUCOH0@ip-10-97-1-34.eu-west-3.compute.internal
Looking forward to your feedback.
Best Regards,
suyu.cmj
Вложения
Hi suyu.cmj > The commit 2090edc6f32f652a2c introduced a change that the > minimal restart_lsn is obtained at the start of checkpoint creation. If a > replication slot is created and performs a WAL reservation concurrently, the > WAL segment contains the new slot's restart_lsn could be removed by the ongoing > checkpoint. Thank you for reporting this issue. I agree, the issue with slot invalidation seems to take place in REL_17_STABLE and earlier, but it is not reproducible in 18+ versions because of different implementation. The problem may appear if the first persistent slot is created during checkpoint, when slot's oldest lsn is invalid. I'm not sure how it works when some other persistent slots exist. Probably, invalidation is still possible if the reservation happens with lsn older than the oldest lsn of existing slots. In 17 and earlier verions, when checkpoint is started in takes slot's oldest lsn using XLogGetReplicationSlotMinimumLSN(). This value will be used later in WAL segments removal. If a new slot reserved the WAL between getting of slots' oldest lsn and WAL removal, it may be invalidated. It happens because ReplicationSlotReserveWal() checks XLogCtl->lastRemovedSegNo but the segments are not yet removed. There is a subtle thing, when the wal reservation completes at the same time when the checkpointer is between KeepLogSeg and RemoveOldXlogFiles where XLogCtl->lastRemovedSegNo is updated. The slot will not be invalidated but the segments, reserved by the new slot, may be removed, I guess. In 17 and earlier we tried to create a compatible solution, when oldest lsn was taken before slot syncing to disk. In the master branch we added a new last_saved_restart_lsn into ReplicationSlot structure which seems to be a better solution. I prepared a simple fix [1] for 17 and earlier versions. It seems it fixes the problem with first persistent slot creation. I also think, it should work as it was before the patch that added this bug. I also did some changes in the original test script, for 17 ([2]) and 18 ([3]) versions. I continue to investigate and test it. [1] 0001-Fix-invalidation-when-slot-is-created-during-checkpo.patch [2] v2-17-0001-Newly-created-replication-slot-may-be-invalidated-by.patch [3] v2-18-0001-Newly-created-replication-slot-may-be-invalidated-by.patch With best regards, Vitaly
Вложения
On Wed, Sep 17, 2025 at 4:19 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote: > > [1] 0001-Fix-invalidation-when-slot-is-created-during-checkpo.patch > - /* Calculate how many segments are kept by slots. */ - keep = slotsMinReqLSN; + /* + * Calculate how many segments are kept by slots. Keep the wal using + * the minimal value from the current reserved LSN and the reserved LSN at + * the moment of checkpoint start (before CheckPointReplicationSlots). + */ + keep = XLogGetReplicationSlotMinimumLSN(); + if (!XLogRecPtrIsInvalid(slotsMinReqLSN)) + keep = Min(keep, slotsMinReqLSN); Can we add why we need this additional calculation here? I have one question regarding commit 2090edc6f32f652a2c: * if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED, _logSegNo, InvalidOid, InvalidTransactionId)) { + /* + * Recalculate the current minimum LSN to be used in the WAL segment + * cleanup. Then, we must synchronize the replication slots again in + * order to make this LSN safe to use. + */ + slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); + CheckPointReplicationSlots(shutdown); + /* * Some slots have been invalidated; recalculate the old-segment * horizon, starting again from RedoRecPtr. */ XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); - KeepLogSeg(recptr, &_logSegNo); + KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo); After invalidating the slots, we recalculate the slotsMinReqLSN with the latest value of XLogGetReplicationSlotMinimumLSN(). Can't it generate a more recent value of slot's restart_lsn which has not been flushed and we may end up removing the corresponding WAL? We should probably add some comments as to why such a race doesn't exist. -- With Regards, Amit Kapila.
On Mon, Sep 15, 2025 at 8:11 PM suyu.cmj <mengjuan.cmj@alibaba-inc.com> wrote: > > > Additionally, while studying the InvalidatePossiblyObsoleteSlot(), I noticed a behavioral difference between PG15 (andearlier) and PG16 (and later). In PG15 and earlier, while attempting to acquire a slot, if the slot's restart_lsn advancedto be greater than oldestLSN, the slot would not be marked invalid. Starting in PG16, whether a slot is marked invalidis determined solely based on initial_restart_lsn, even if the slot's restart_lsn advances above oldestLSN while waiting,the slot will still be marked invalid. The initial_restart_lsn is recorded to report the correct invalidation cause(see discussion [2]), but why not decide whether to mark the slot as invalid based on the slot's current restart_lsn?If a slot's restart_lsn has already advanced sufficiently, shouldn't we refrain from invalidating it? > I haven't tried to reproduce it but I see your point. I agree that if this is happening then we should not invalidate such slots. This is a different problem related to a different commit than what is fixd as 2090edc6f32f652a2c. I suggest you to either start a new thread for this or report in the original thread where this change was made. -- With Regards, Amit Kapila.
Hi Amit,
Thank you for your reply. Following your suggestion, I have started a new discussion thread for this issue:
Best regards,
suyu.cmj
On Tue, Sep 23, 2025 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Sep 17, 2025 at 4:19 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote: > > > > [1] 0001-Fix-invalidation-when-slot-is-created-during-checkpo.patch > > > > - /* Calculate how many segments are kept by slots. */ > - keep = slotsMinReqLSN; > + /* > + * Calculate how many segments are kept by slots. Keep the wal using > + * the minimal value from the current reserved LSN and the reserved LSN at > + * the moment of checkpoint start (before CheckPointReplicationSlots). > + */ > + keep = XLogGetReplicationSlotMinimumLSN(); > + if (!XLogRecPtrIsInvalid(slotsMinReqLSN)) > + keep = Min(keep, slotsMinReqLSN); > > Can we add why we need this additional calculation here? > I was thinking some more about this solution. Won't it lead to the same problem if ReplicationSlotReserveWal() calls ReplicationSlotsComputeRequiredLSN() after the above calculation of checkpointer? -- With Regards, Amit Kapila.
RE: Newly created replication slot may be invalidated by checkpoint
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit, Vitaly, > I was thinking some more about this solution. Won't it lead to the > same problem if ReplicationSlotReserveWal() calls > ReplicationSlotsComputeRequiredLSN() after the above calculation of > checkpointer? Exactly. I verified that in your patch, the invalidation can still happen if we cannot finish the LSN computation before the KeepLogSegments(). Attached file can be applied atop 0001-Fix-invalidation-... and v2-17-0001-Newly-created-replication... patches. It could invalidate the given slot. Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
Dear Amit, Hayato On Wednesday, September 24, 2025 14:31 MSK, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote: >> I was thinking some more about this solution. Won't it lead to the >> same problem if ReplicationSlotReserveWal() calls >> ReplicationSlotsComputeRequiredLSN() after the above calculation of >> checkpointer? > Exactly. I verified that in your patch, the invalidation can still happen if > we cannot finish the LSN computation before the KeepLogSegments(). Yes. The moment, when WAL reservation takes place is the call of ReplicationSlotsComputeRequiredLSN which updates the oldest slots' lsn (XLogCtl->replicationSlotMinLSN). If it occurs at the moment between KeepLogSeg and RemoveOldXlogFiles, such reservation will not be taken into account. This behaviour seems to be before commit 2090edc6f32f652a2c, but the probability of such race condition was too slow due to the short time period between KeepLogSeg and RemoveOldXlogFiles. The commit 2090edc6f32f652a2c increased the probability of such race condition because CheckPointGuts can take greater time to execute. The attached patch doesn't solve the original problem completely but it decreases the probability of such race condition, as it was before the commit. I propose to apply this patch and then to think how to resolve this race condition, which seems to take place in 18 and master as well. I updated the patch by improving some comments as suggested by Amit. With best regards, Vitaly
Вложения
RE: Newly created replication slot may be invalidated by checkpoint
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Vitaly, > I propose to apply this patch and then to think how to resolve this race > condition, which seems to take place in 18 and master as well. No, I think this invalidation can't happen in PG18/HEAD. This is because in CheckpointGuts()->CheckPointReplicationSlots()-> ReplicationSlotsComputeRequiredLSN(), slots are examined one by one to determine whether their restart_lsn has advanced since the last check. If any slot has advanced, protection is applied starting from the oldest restart_lsn. Crucially, this check is performed before WAL removal. The function call was introduced in commit ca307d5cec. Further analysis shows that it is also safe if a slot is being created and WAL advances after CheckpointGuts() but before the removal segments are determined. In this case the restart_lsn points the CHECKPOINT_REDO generated by the current CHECKPOINT. This and later records won't be discarded. Best regards, Hayato Kuroda FUJITSU LIMITED
RE: Newly created replication slot may be invalidated by checkpoint
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Vitaly, Would you have enough time to work on and fix the issue? One idea is to compute the required LSN by the system at the slot checkpoint. This partially follows what PG18/HEAD does but seems hacky and difficult to accept. Best regards, Hayato Kuroda FUJITSU LIMITED
Dear Hayato, > Would you have enough time to work on and fix the issue? > One idea is to compute the required LSN by the system at the slot checkpoint. This > partially follows what PG18/HEAD does but seems hacky and difficult to accept. I'm working on the issue. Give me, please, a couple of days to finalize my work. In short, I think to call ReplicationSlotsComputeRequiredLSN() right before slotsMinReqLSN assignment in CreateCheckPoint in 17 and earlier versions. At this moment, we already have a new redo lsn. I consider, that the WAL reservation happens when we assign restart_lsn to a slot. Taking into account this consideration, I distinguish two cases - WAL reservation happens before or after new redo ptr assignment. If we reserve the WAL after new redo ptr, it will protect the slot's reservation, as you've mentioned. The problem appears, when we reserve the WAL before a new redo ptr, but the update of XLogCtl->replicationSlotMinLSN was not yet hapenned. When we assign slotsMinReqLSN, we use XLogCtl->replicationSlotMinLSN. The call of ReplicationSlotsComputeRequiredLSN before slotsMinReqLSN assignment can help. It will be guaranteed, that those slots with WAL reservation before a new redo ptr will be protected by slotsMinReqLSN, but slots with wal reservation after a new redo ptr will be protected by the redo ptr. I think it is about the same as you proposed. These reasonings are applied to physical slots, but it seems to be ok for logical slots. One moment, I'm not sure, when we create a logical slot in recovery. In this case, GetXLogReplayRecPtr() is used. I'm not sure, that redo ptr will protect such slot in CreateRestartPoint. With best regards, Vitaly