RE: Potential data loss due to race condition during logical replication slot creation

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: Potential data loss due to race condition during logical replication slot creation
Дата
Msg-id TYCPR01MB12077A67B15F682BC4DC835E4F5342@TYCPR01MB12077.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Potential data loss due to race condition during logical replication slot creation  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-bugs
Dear Sawada-san,

> 
> With the PoC patch, we check ondisk.builder.is_there_running_xact in
> SnapBuildRestore(),

Yes, the PoC requires that the state of snapshot in the file must be read.

> but can we just check running->xcnt in
> SnapBuildFindSnapshot() to skip calling SnapBuildRestore()? That is,
> if builder->initial_xmin_horizon is valid (or
> builder->finding_start_point is true) and running->xcnt > 0, we skip
> the snapshot restore.

IIUC, it does not require modifications of API. It may be an advantage.

> However, I think there are still cases where we
> unnecessarily skip snapshot restores
>
> Probably, what we would like to avoid is, we compute
> initial_xmin_horizon and start to find the initial start point while
> there is a concurrently running transaction, and then jump to the
> consistent state  by restoring the consistent snapshot before the
> concurrent transaction commits.

Yeah, information before concurrent txns are committed should not be used. I think
that's why SnapBuildWaitSnapshot() waits until listed transactions are finished.

> So we can ignore snapshot restores if
> (oldest XID among transactions running at the time of
> CreateInitDecodingContext()) >= (OldestRunningXID in
> xl_running_xacts).
> 
> I've drafted this idea in the attached patch just for discussion.

Thanks for sharing the patch. At least I confirmed all tests and workload you
pointed out in [1] were passed. I will post here if I found other issues.

[1]: https://www.postgresql.org/message-id/CAD21AoDzLY9vRpo%2Bxb2qPtfn46ikiULPXDpT94sPyFH4GE8bYg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: BUG #18409: After my windows update, I can not run postgre 16 server
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18410: SQL Error [XX000]: ERROR: variable not found in subplan target list