Re: Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes)
От | Tomas Vondra |
---|---|
Тема | Re: Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes) |
Дата | |
Msg-id | ea96bc84-e242-4179-a440-9d4b8a7bae9f@enterprisedb.com обсуждение исходный текст |
Ответ на | RE:Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes) (<Rintaro.Ikeda@nttdata.com>) |
Список | pgsql-bugs |
On 3/4/24 09:35, Rintaro.Ikeda@nttdata.com wrote: > Hi, > > I correct the previous bug report [1] to provide a more accurate > description. The bug report demonstrated undetected deadlock between > client backend and startup processes on a standby server. (The title > in the previous bug report is "Undetected deadlock between primary > and standby processes". But this was wrong. Actually, this should be > noted that "Undetected deadlock between client backend and startup > process on a standby server".) > > After the procedures proposed in my bug report [1], a recovery > conflict is present because the tablespace which startup process > tries to drop is used by cliend backend process in standby. We see > the pg_stat_activity (shown below), which implies a deadlock. A > client backend process waits for AccessExclusiveLock to be released. > Startup process waits for recovery conflict resolution for dropping > the tablespace. This deadlock is not resolved after deadlock_timeout > passes. > > (Standby server) > postgres=# select datid, datname, wait_event_type, wait_event, query, backend_type from pg_stat_activity ; > datid | datname | wait_event_type | wait_event | query | backend_type > -------+----------+-----------------+----------------------------+-------------------------------------------------------------------------------------------------+------------------- > 5 | postgres | Lock | relation | SELECT * FROM t; | client backend > | | IPC | RecoveryConflictTablespace | | startup > > > This deadlock is similar to the previously identified and patched > issue [2], which also involved an undetected deadlock between > backend process and recovery on a standby server. I think the > deadlock explained in this report should be detected and resolved. > Thanks for the report. So what are the steps to reproduce this? The previous message did all kinds of stuff on the primary and then got stuck on pg_switch_wal() on the primary, but this updated seems to do stuff on the standby and gets the lockup there. It seems similar in the sense that it's about interaction between recovery and a regular backend, but unfortunately ResolveRecoveryConflictWithVirtualXIDs does not wait for a lock, it just checks if the XID is still running, so it's invisible to the deadlock detector :-( But it's still checked against max_standby_streaming_delay, which should resolve the deadlock (unless set to -1 to allow infinite delays) at some point, right? Also, I'm not very familiar with ResolveRecoveryConflictWithVirtualXIDs, but it seems it's doing a busy wait. I wonder if that's a good idea, but it's independent of this bug report. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-bugs по дате отправления: