Re: Race conditions in 019_replslot_limit.pl
От | Andres Freund |
---|---|
Тема | Re: Race conditions in 019_replslot_limit.pl |
Дата | |
Msg-id | 20220225201558.iabxf6k7edohggo7@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Race conditions in 019_replslot_limit.pl (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Race conditions in 019_replslot_limit.pl
|
Список | pgsql-hackers |
Hi, On 2022-02-25 15:07:01 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Seems to suggest something is holding a problematic lock in a way that I do not understand yet: > > > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2022-02-23%2013%3A47%3A20&stg=recovery-check > > 2022-02-23 09:09:52.299 EST [2022-02-23 09:09:52 EST 1997084:6] 019_replslot_limit.pl LOG: received replication command:CREATE_REPLICATION_SLOT "pg_basebackup_1997084" TEMPORARY PHYSICAL ( RESERVE_WAL) > > ... > > 2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:14] 019_replslot_limit.pl DEBUG: shmem_exit(0): 4 before_shmem_exitcallbacks to make > > 2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:15] 019_replslot_limit.pl DEBUG: replication slot exithook, without active slot > > 2022-02-23 09:09:52.518 EST [2022-02-23 09:09:52 EST 1997084:16] 019_replslot_limit.pl DEBUG: temporary replicationslot cleanup: begin > > > last message from 1997084 until the immediate shutdown. > > Hmm. Maybe put a couple more debug messages into ReplicationSlotCleanup > and/or ReplicationSlotDropPtr? It doesn't seem very clear where in that > sequence it's hanging up. Yea, was thinking that as well. I'm also wondering whether it's worth adding an assert, or at least a WARNING, about no lwlocks held to the tail end of ShutdownPostgres? I don't want to add an LWLockReleaseAll() yet, before I understand what's actually happening. > > We could be more certain if we shut down the cluster in fast rather than > > immediate mode. So I'm thinking of doing something like > > Does that risk an indefinite hangup of the buildfarm run? I think not. The pg_ctl stop -m fast should time out after PGCTLTIMEOUT, $self->_update_pid(-1); should notice it's not dead. The END{} block should then shut it down in immediate mode. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: