RE: Potential data loss due to race condition during logical replication slot creation
От | Hayato Kuroda (Fujitsu) |
---|---|
Тема | RE: Potential data loss due to race condition during logical replication slot creation |
Дата | |
Msg-id | TYCPR01MB12077ADB13450B422B6161A01F5342@TYCPR01MB12077.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Potential data loss due to race condition during logical replication slot creation (Masahiko Sawada <sawada.mshk@gmail.com>) |
Ответы |
Re: Potential data loss due to race condition during logical replication slot creation
|
Список | pgsql-bugs |
Dear Sawada-san, Thanks for giving comments and sorry for late reply. > > If so, one idea to > > achieve could be that we maintain the highest_running_xid while > > serailizing the snapshot and then during restore if that > > highest_running_xid is <= builder->initial_xmin_horizon, then we > > ignore restoring the snapshot. We already have few such cases handled > > in SnapBuildRestore(). > > I think that builder->initial_xmin_horizon could be older than > highest_running_xid, for example, when there is a logical replication > slot whose catalog_xmin is old. However, even in this case, we might > need to ignore restoring the snapshot. For example, a slightly > modified test case still can cause the same problem. > > The test case in the Kuroda-san's v2 patch: > permutation "s0_init" "s0_begin" "s0_insert1" "s1_init" > "s2_checkpoint" "s2_get_changes_slot0" "s0_insert2" "s0_commit" > "s1_get_changes_slot0"\ "s1_get_changes_slot1" > > Modified-version test case (add "s0_insert1" between "s0_init" and "s0_begin"): > permutation "s0_init" "s0_insert1" "s0_begin" "s0_insert1" "s1_init" > "s2_checkpoint" "s2_get_changes_slot0" "s0_insert2" "s0_commit" > "s1_get_changes_slot0\ " "s1_get_changes_slot1" Good catch, I confirmed that the variant still partially output transactions: ``` step s1_get_changes_slot1: SELECT data FROM pg_logical_slot_get_changes('slot1', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids','0'); data ----------------------------------------- BEGIN table public.tbl: INSERT: val1[integer]:2 COMMIT (3 rows) ``` ///// I analyzed a bit. In below descriptions, I uses these notations: txn0 - has a single insert txn1 - has begin-insert-commit operations. slot1 is created in parallel While creating slot0, initial_xmin_horizon was also set, and it come from GetOldestSafeDecodingTransactionId(). Since there were no slots in the database before, the value was just a next xid. Then, the pinned xid was assigned to the txn0. It is correct behavior because slot0 must decode txn0. Finally, while creating slot1 after the txn1, the snapshot serialized by slot0 was restored once. txn1 was the recorded as the highest_running, but initial_xmin was older than it. So the snapshot was restored and txn1 was partially decoded. initial_xmin_horizon (txn0) < highest_running_xid (txn1) ///// So, how should we fix? One idea is based on (c), and adds a flag which indicates an existence of concurrent transactions. The restoration is skipped if both two flags are true. PSA the PoC patch. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Вложения
В списке pgsql-bugs по дате отправления: