Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

Поиск
Список
Период
Сортировка
От vignesh C
Тема Re: Race condition in FetchTableStates() breaks synchronization of subscription tables
Дата
Msg-id CALDaNm0X8oUiW1CzniPZsDxqjP-VoYuEvb1h7NFXohKc1P5HEw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Race condition in FetchTableStates() breaks synchronization of subscription tables  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: Race condition in FetchTableStates() breaks synchronization of subscription tables  (Alexander Lakhin <exclusion@gmail.com>)
Список pgsql-hackers
On Tue, 6 Feb 2024 at 18:30, Alexander Lakhin <exclusion@gmail.com> wrote:
>
> 05.02.2024 13:13, vignesh C wrote:
> > Thanks for the steps for the issue, I was able to reproduce this issue
> > in my environment with the steps provided. The attached patch has a
> > proposed fix where the latch will not be set in case of the apply
> > worker exiting immediately after starting.
>
> It looks like the proposed fix doesn't help when ApplyLauncherWakeup()
> called by a backend executing CREATE SUBSCRIPTION command.
> That is, with the v4-0002 patch applied and pg_usleep(300000L); added
> just below
>              if (!worker_in_use)
>                  return worker_in_use;
> I still observe the test 027_nosuperuser running for 3+ minutes:
> t/027_nosuperuser.pl .. ok
> All tests successful.
> Files=1, Tests=19, 187 wallclock secs ( 0.01 usr  0.00 sys +  4.82 cusr  4.47 csys =  9.30 CPU)
>
> IIUC, it's because a launcher wakeup call, sent by "CREATE SUBSCRIPTION
> regression_sub ...", gets missed when launcher waits for start of another
> worker (logical replication worker for subscription "admin_sub"), launched
> just before that command.

Yes, the wakeup call sent by the "CREATE SUBSCRIPTION" command was
getting missed in this case. The wakeup call can be sent during
subscription creation/modification and when the apply worker exits.
WaitForReplicationWorkerAttach should not reset the latch here as it
will end up delaying the apply worker to get started after 180 seconds
timeout(DEFAULT_NAPTIME_PER_CYCLE). The attached patch does not reset
the latch and lets ApplyLauncherMain to reset the latch and checks if
any new worker or missing worker needs to be started.

Regards,
Vignesh

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: jian he
Дата:
Сообщение: Re: recently added jsonpath method change jsonb_path_query, jsonb_path_query_first immutability
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Improve eviction algorithm in ReorderBuffer