Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Поиск
Список
Период
Сортировка
От Melih Mutlu
Тема Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Дата
Msg-id CAGPVpCSNExJ3tgK8QgnNUb1QVGvJNprW7LWJ-8fWfGKgtcittw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (shveta malik <shveta.malik@gmail.com>)
Ответы Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (shveta malik <shveta.malik@gmail.com>)
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (shveta malik <shveta.malik@gmail.com>)
Список pgsql-hackers
Hi Shveta,

shveta malik <shveta.malik@gmail.com>, 1 Şub 2023 Çar, 15:01 tarihinde şunu yazdı:
On Wed, Feb 1, 2023 at 5:05 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
2) I found a crash in the previous patch (v9), but have not tested it
on the latest yet. Crash happens when all the replication slots are
consumed and we are trying to create new. I tweaked the settings like
below so that it can be reproduced easily:
max_sync_workers_per_subscription=3
max_replication_slots = 2
and then ran the test case shared by you. I think there is some memory
corruption happening. (I did test in debug mode, have not tried in
release mode). I tried to put some traces to identify the root-cause.
I observed that worker_1 keeps on moving from 1 table to another table
correctly, but at some point, it gets corrupted i.e. origin-name
obtained for it is wrong and it tries to advance that and since that
origin does not exist, it  asserts and then something else crashes.
From log: (new trace lines added by me are prefixed by shveta, also
tweaked code to have my comment 1 fixed to have clarity on worker-id).

form below traces, it is clear that worker_1 was moving from one
relation to another, always getting correct origin 'pg_16688_1', but
at the end it got 'pg_16688_49' which does not exist. Second part of
trace shows who updated 'pg_16688_49', it was done by worker_49 which
even did not get chance to create this origin due to max_rep_slot
reached.

Thanks for investigating this error. I think it's the same error as the one Shi reported earlier. [1]
I couldn't reproduce it yet but will apply your tweaks and try again. 
Looking into this.


Thanks,
--
Melih Mutlu
Microsoft

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Melih Mutlu
Дата:
Сообщение: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Следующее
От: adherent postgres
Дата:
Сообщение: About PostgreSQL Core Team