RE: Potential data loss due to race condition during logical replication slot creation

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: Potential data loss due to race condition during logical replication slot creation
Дата
Msg-id TYCPR01MB1207719C811F580A8774C79B7F52A2@TYCPR01MB12077.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Potential data loss due to race condition during logical replication slot creation  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы RE: Potential data loss due to race condition during logical replication slot creation  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Список pgsql-bugs
Dear hackers,

While analyzing another failure [1], I found here. I think they occurred by the
same reason.

The reported failure occurred when the replication slot is created in the middle
of the transaction and it reuses the snapshot from other slot. The reproducer is:

```
Session0

SELECT pg_create_logical_replication_slot('slot0', 'test_decoding');
BEGIN;
INSERT INTO foo ...

Session1

SELECT pg_create_logical_replication_slot('slot1', 'test_decoding');

Session2

CHECKPOINT;
SELECT pg_logical_slot_get_changes('slot0', NULL, NULL);

Session0

INSERT INTO var ... // var is defined with (user_catalog_table = true)
COMMIT;

Session1
SELECT pg_logical_slot_get_changes('slot1', NULL, NULL);
-> Assertion failure.
```

> Here is the summary of several proposals we've discussed:
> a) Have CreateInitDecodingContext() always pass need_full_snapshot =
> true to AllocateSnapshotBuilder().

> b) Have snapbuild.c being able to handle multiple SnapBuildOnDisk versions.

> c) Add a global variable, say in_create, to snapbuild.c

Regarding three options raised by Sawada-san, I preferred the approach a).
Since the issue could happen for all supported branches, we should choose the
conservative approach. Also, it is quite painful if there are some codes for
handling the same issue.

Attached patch implemented the approach a) since no one made. I also added
the test which can do assertion failure, but not sure it should be included.

[1]:
https://www.postgresql.org/message-id/TYCPR01MB1207717063D701F597EF98A0CF5272%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kristo Marijo
Дата:
Сообщение: AW: BUG #18389: pg_database_owner not recognized with alter default privileges
Следующее
От: Ronan Dunklau
Дата:
Сообщение: Re: FSM Corruption (was: Could not read block at end of the relation)