Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
От | Andres Freund |
---|---|
Тема | Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1 |
Дата | |
Msg-id | 20230114172022.3oy77jhzippyupgx@awork3.anarazel.de обсуждение исходный текст |
Ответ на | Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1 (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
|
Список | pgsql-bugs |
Hi, On 2023-01-14 08:02:01 -0800, Andres Freund wrote: > Because the logical rep code explicitly prevents interrupts: > > /* > * Create a new permanent logical decoding slot. This slot will be used > * for the catchup phase after COPY is done, so tell it to use the > * snapshot to make the final data consistent. > * > * Prevent cancel/die interrupts while creating slot here because it is > * possible that before the server finishes this command, a concurrent > * drop subscription happens which would complete without removing this > * slot leading to a dangling slot on the server. > */ > HOLD_INTERRUPTS(); > walrcv_create_slot(LogRepWorkerWalRcvConn, > slotname, false /* permanent */ , false /* two_phase */ , > CRS_USE_SNAPSHOT, origin_startpos); > RESUME_INTERRUPTS(); > > Which is just completely entirely wrong. Independent of this issue even. Not > allowing termination for the duration of command executed over network? > > This is from: > > commit 6b67d72b604cb913e39324b81b61ab194d94cba0 > Author: Amit Kapila <akapila@postgresql.org> > Date: 2021-03-17 08:15:12 +0530 > > Fix race condition in drop subscription's handling of tablesync slots. > > Commit ce0fdbfe97 made tablesync slots permanent and allow Drop > Subscription to drop such slots. However, it is possible that before > tablesync worker could get the acknowledgment of slot creation, drop > subscription stops it and that can lead to a dangling slot on the > publisher. Prevent cancel/die interrupts while creating a slot in the > tablesync worker. > > Reported-by: Thomas Munro as per buildfarm > Author: Amit Kapila > Reviewed-by: Vignesh C, Takamichi Osumi > Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com > > > But this can't be the right fix. I wonder if we ought to put a Assert(InterruptHoldoffCount == 0 && CritSectionCount == 0) in some of the routines in libpqwalreceiver to protect against issues like this. It'd be easy enough to introduce one accidentally, due to holding an lwlock. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления: