Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored

Поиск

Список

Период

Сортировка

От	Dilip Kumar
Тема	Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored
Дата	19 июня 18:17:47
Msg-id	CAFiTN-sOv02tM0MKVpBp=35q3eb+Vp2fitKBPBt7=bDwYbWTCg@mail.gmail.com обсуждение исходный текст
Ответ на	BUG #18961: Race scenario where max_standby_streaming_delay is not honored (PG Bug reporting form <noreply@postgresql.org>)
Список	pgsql-bugs

Дерево обсуждения

On Thu, Jun 19, 2025 at 5:25 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18961
> Logged by:          Anthony Hsu
> Email address:      erwaman@gmail.com
> PostgreSQL version: 17.5
> Operating system:   Linux
> Description:
>
> In the current ResolveRecoveryConflictWithBufferPin implementation in
> standby.c, I think there's a race scenario where a backend holding a
> conflicting buffer pin won't receive a PROCSIG_RECOVERY_CONFLICT_BUFFERPIN
> message promptly:
> 1. Assume max_standby_streaming_delay has expired when the startup process
> enters ResolveRecoveryConflictWithBufferPin
> 2. Assume backend 1 holds a conflicting buffer pin while backend 2 does not
> 3. Since we are past the standby limit time, the startup process broadcasts
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN here [1] without enabling any timeouts
> 4. Then the startup process waits to be woken up via
> ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN) here [2]
> 5. Suppose backend 2 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN first,
> sees it does not hold a conflicting buffer pin, and *then* proceeds to pin
> the buffer
> 6. Suppose then backend 1 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
> processes interrupts, and cancels itself. During cleanup, in UnpinBuffer(),
> it will see the pin count is still > 1 (startup process + backend 2 have it
> pinned), so it will NOT wake up the startup process.
> 7. The startup process will only get woken up once backend 2 unpins the
> buffer and the pin count reaches 1 (or some other signal causes the startup
> process's latch to be set). Only then will it try to acquire the cleanup
> lock again and broadcast another PROCSIG_RECOVERY_CONFLICT_BUFFERPIN message
> if it fails to acquire the cleanup lock.
> The unexpected behavior is in step (7), where it could be arbitrarily long
> until the startup process is woken up again. The expected behavior is that
> since max_standby_streaming_delay has already expired, the startup process
> should wake up quickly and broadcast another
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN message if there are still conflicting
> backends.
> I was able to reproduce this scenario with some custom code to control the
> execution sequence.
> One way to fix this scenario is to just remove the `if` block here [3]
> entirely so that we always enable the STANDBY_TIMEOUT and
> STANDBY_DEADLOCK_TIMEOUT timeouts.
> [1]
>
https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L805
> [2]
>
https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L842
> [3]
>
https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L800-L806

I agree this looks like a race condition, but I am not sure about the
proposed solution. ResolveRecoveryConflictWithBufferPin() is called by
LockBufferForCleanup() to wait for a process currently holding a
buffer pin. However, new processes can still acquire the pin.. So I
think the problem lies in the logic of the wakeup mechanism of the
UnpinBuffer() no? The intended behavior is for the startup process to
be woken regardless of Backend 2 subsequently acquiring the pin, but
the current tracking mechanism is insufficient.

--
Regards,
Dilip Kumar
Google

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored