RE: DROP DATABASE deadlocks with logical replication worker in PG 15.1
От | houzj.fnst@fujitsu.com |
---|---|
Тема | RE: DROP DATABASE deadlocks with logical replication worker in PG 15.1 |
Дата | |
Msg-id | OS0PR01MB571678C898EA980B444BC7CE94C49@OS0PR01MB5716.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1 (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
RE: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1 |
Список | pgsql-bugs |
On Wednesday, January 18, 2023 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jan 18, 2023 at 1:34 AM Andres Freund <andres@anarazel.de> wrote: > > > > On 2023-01-17 06:23:45 +0530, Amit Kapila wrote: > > > > > There is an analysis of the test > > > failure in the email [2] which explains the race condition that > > > leads to test failure. Thinking again about the failure, I feel we > > > can instead change the failed test (t/004_sync.pl) to either ensure > > > that both the walsenders (corresponding to sync worker and apply > > > worker) exits after dropping the subscription and before checking > > > the remaining slots on publisher or wait for slots to become zero in > > > the test. > > > > How about waiting for the table to start to be synced (and thus the > > slot to be > > created) before issuing the drop subscription? > > > > In this test [1], the initial sync fails due to a unique constraint violation, so > checking that the sync has started is a bit tricky. We can probably check > sync_error_count in pg_stat_subscription_stats to ensure that sync has started to > fail which will ideally ensure that the sync has started. I am not sure this would be > completely safe. The other possible ways are (a) after creating a subscription, > wait for two slots to get created in the publisher, and then after dropping > subscription wait for slots to become zero on the publisher; (b) after dropping > the subscription, wait for slots to become zero. > > I think one of (a) or (b) will work. I think in the mentioned testcase, the tablesync worker will keep restarting which means the table sync slot is also being dropped and re-created ... . So, (a) waiting for two slots to get created might not work as the slot will get dropped soon. I think (b) waiting for slot to become zero would be a simpler way to make the test stable. And here are the patches that tries to do it for all affected branches. Best regards, Hou zj
Вложения
В списке pgsql-bugs по дате отправления: