Re: ReplicationSlotRelease may set the statusFlags of other processes in PG14

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: ReplicationSlotRelease may set the statusFlags of other processes in PG14
Дата
Msg-id ZfkNP1OdgBSPPTsR@paquier.xyz
обсуждение исходный текст
Ответ на ReplicationSlotRelease may set the statusFlags of other processes in PG14  ("feichanghong" <feichanghong@qq.com>)
Ответы Re: ReplicationSlotRelease may set the statusFlags of other processes in PG14  (feichanghong <feichanghong@qq.com>)
Список pgsql-bugs
On Sat, Mar 16, 2024 at 10:29:03PM +0800, feichanghong wrote:
> A process utilizing replication slots (usually walsender) calls callback
> functions in the order of RemoveProcFromArray->ProcKill upon abnormal exit.
> Within RemoveProcFromArray, MyProc is already removed from the ProcArray.
> ProcKill then attempts to set ProcGlobal->statusFlags[MyProc->pgxactoff] again
> via ReplicationSlotRelease. By this time, the flag may already be assigned to
> another process.

Oops.

> To replicate the issue, execute the following steps:
> 1. Apply the attached v1-0000-v14-invalidate-pgxactoff-after-remove-pgproc.patch,
> where pgxactoff is set to an invalid value in ProcArrayRemove, and some
> checks are added.
> 2. Use the SQL below to terminate the walsender process.
> ```
> select pg_terminate_backend(pid) from pg_stat_activity where backend_type = 'walsender';
> ```
> # Fix
>
> To fix the issue, I have provided some patches in the attachment:
> 1. Backpatching 2f6501f into the PG14 version will fix the problem.
> 2. In PG14-head, ProcArrayRemove needs to reset pgxactoff, and some assert
> checks should be done when setting ProcGlobal->statusFlags.

Yeah, that's something that we had better fix in all stable branches.
The asserts would offer some protection moving on, but I would take
the safer move of only adding a protection like what you are
suggestion on HEAD and not in stable branches, just in case we're
missing something around them.
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: ocean_li_996
Дата:
Сообщение: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Potential data loss due to race condition during logical replication slot creation