Re: Deadlock between logrep apply worker and tablesync worker

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Deadlock between logrep apply worker and tablesync worker
Дата
Msg-id CAA4eK1KXDanJqxC_qrUqxEu1VbvySBn8qcCK+2zM0CAb+g2boQ@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Deadlock between logrep apply worker and tablesync worker  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Ответы Re: Deadlock between logrep apply worker and tablesync worker  (Amit Kapila <amit.kapila16@gmail.com>)
RE: Deadlock between logrep apply worker and tablesync worker  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
On Thu, Feb 2, 2023 at 12:05 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, January 31, 2023 1:07 AM vignesh C <vignesh21@gmail.com> wrote:
> > On Mon, 30 Jan 2023 at 17:30, vignesh C <vignesh21@gmail.com> wrote:
> >
>
> I also tried to test the time of "src/test/subscription/t/002_types.pl"
> before and after the patch(change the lock level) and Tom's patch(split
> transaction) like what Vignesh has shared on -hackers.
>
> I run about 100 times for each case. Tom's and the lock level patch
> behave similarly on my machines[1].
>
> HEAD: 3426 ~ 6425 ms
> HEAD + Tom: 3404 ~ 3462 ms
> HEAD + Vignesh: 3419 ~ 3474 ms
> HEAD + Tom + Vignesh: 3408 ~ 3454 ms
>
> Even apart from the testing time reduction, reducing the lock level and lock
> the specific object can also help improve the lock contention which user(that
> use the exposed function) , table sync worker and apply worker can also benefit
> from it. So, I think pushing the patch to change the lock level makes sense.
>
> And the patch looks good to me.
>

Thanks for the tests. I also see a reduction in test time variability
with Vignesh's patch. I think we can release the locks in case the
origin is concurrently dropped as in the attached patch. I am planning
to commit this patch tomorrow unless there are more comments or
objections.

> While on it, after pushing the patch, I think there is another case might also
> worth to be improved, that is the table sync and apply worker try to drop the
> same origin which might cause some delay. This is another case(different from
> the deadlock), so I feel we can try to improve this in another patch.
>

Right, I think that case could be addressed by Tom's patch to some
extent but I am thinking we should also try to analyze if we can
completely avoid the need to remove origins from both processes. One
idea could be to introduce another relstate something like
PRE_SYNCDONE and set it in a separate transaction before we set the
state as SYNCDONE and remove the slot and origin in tablesync worker.
Now, if the tablesync worker errors out due to some reason during the
second transaction, it can remove the slot and origin after restart by
checking the state. However, it would add another relstate which may
not be the best way to address this problem. Anyway, that can be
accomplished as a separate patch.

-- 
With Regards,
Amit Kapila.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jakub Wartak
Дата:
Сообщение: Re: Syncrep and improving latency due to WAL throttling
Следующее
От: shveta malik
Дата:
Сообщение: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication