Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored
От | Dilip Kumar |
---|---|
Тема | Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored |
Дата | |
Msg-id | CAFiTN-sOv02tM0MKVpBp=35q3eb+Vp2fitKBPBt7=bDwYbWTCg@mail.gmail.com обсуждение исходный текст |
Ответ на | BUG #18961: Race scenario where max_standby_streaming_delay is not honored (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
On Thu, Jun 19, 2025 at 5:25 PM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 18961 > Logged by: Anthony Hsu > Email address: erwaman@gmail.com > PostgreSQL version: 17.5 > Operating system: Linux > Description: > > In the current ResolveRecoveryConflictWithBufferPin implementation in > standby.c, I think there's a race scenario where a backend holding a > conflicting buffer pin won't receive a PROCSIG_RECOVERY_CONFLICT_BUFFERPIN > message promptly: > 1. Assume max_standby_streaming_delay has expired when the startup process > enters ResolveRecoveryConflictWithBufferPin > 2. Assume backend 1 holds a conflicting buffer pin while backend 2 does not > 3. Since we are past the standby limit time, the startup process broadcasts > PROCSIG_RECOVERY_CONFLICT_BUFFERPIN here [1] without enabling any timeouts > 4. Then the startup process waits to be woken up via > ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN) here [2] > 5. Suppose backend 2 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN first, > sees it does not hold a conflicting buffer pin, and *then* proceeds to pin > the buffer > 6. Suppose then backend 1 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN, > processes interrupts, and cancels itself. During cleanup, in UnpinBuffer(), > it will see the pin count is still > 1 (startup process + backend 2 have it > pinned), so it will NOT wake up the startup process. > 7. The startup process will only get woken up once backend 2 unpins the > buffer and the pin count reaches 1 (or some other signal causes the startup > process's latch to be set). Only then will it try to acquire the cleanup > lock again and broadcast another PROCSIG_RECOVERY_CONFLICT_BUFFERPIN message > if it fails to acquire the cleanup lock. > The unexpected behavior is in step (7), where it could be arbitrarily long > until the startup process is woken up again. The expected behavior is that > since max_standby_streaming_delay has already expired, the startup process > should wake up quickly and broadcast another > PROCSIG_RECOVERY_CONFLICT_BUFFERPIN message if there are still conflicting > backends. > I was able to reproduce this scenario with some custom code to control the > execution sequence. > One way to fix this scenario is to just remove the `if` block here [3] > entirely so that we always enable the STANDBY_TIMEOUT and > STANDBY_DEADLOCK_TIMEOUT timeouts. > [1] > https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L805 > [2] > https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L842 > [3] > https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L800-L806 I agree this looks like a race condition, but I am not sure about the proposed solution. ResolveRecoveryConflictWithBufferPin() is called by LockBufferForCleanup() to wait for a process currently holding a buffer pin. However, new processes can still acquire the pin.. So I think the problem lies in the logic of the wakeup mechanism of the UnpinBuffer() no? The intended behavior is for the startup process to be woken regardless of Backend 2 subsequently acquiring the pin, but the current tracking mechanism is insufficient. -- Regards, Dilip Kumar Google
В списке pgsql-bugs по дате отправления: