回复:The same 2PC data maybe recovered twice
От | 蔡梦娟(玊于) |
---|---|
Тема | 回复:The same 2PC data maybe recovered twice |
Дата | |
Msg-id | 0706bec1-80ae-4f99-8cf8-c89734978770.mengjuan.cmj@alibaba-inc.com обсуждение исходный текст |
Ответ на | The same 2PC data maybe recovered twice ("蔡梦娟(玊于)" <mengjuan.cmj@alibaba-inc.com>) |
Ответы |
Re: The same 2PC data maybe recovered twice
Re: The same 2PC data maybe recovered twice |
Список | pgsql-hackers |
Hi, all
I add a patch for pg11 to fix this bug, hope you can check it.
Thanks & Best Regard
------------------------------------------------------------------发件人:蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>发送时间:2023年7月6日(星期四) 10:02收件人:pgsql-hackers <pgsql-hackers@postgresql.org>抄 送:pgsql-bugs <pgsql-bugs@postgresql.org>主 题:The same 2PC data maybe recovered twiceHi, all. I want to report a bug about recovery of 2pc data, in current implementation of crash recovery, there are two ways to recover 2pc data:1、before redo, func restoreTwoPhaseData() will restore 2pc data those xid < ShmemVariableCache->nextXid, which is initialized from checkPoint.nextXid;2、during redo, func xact_redo() will add 2pc from wal;The following scenario may cause the same 2pc to be added repeatedly:1、start creating checkpoint_1, checkpoint_1.redo is set as curInsert;2、before set checkPoint_1.nextXid, a new 2pc is prepared, suppose the xid of this 2pc is 100, and then ShmemVariableCache->nextXid will be advanced as 101;3、checkPoint_1.nextXid is set as 101;4、in CheckPointTwoPhase() of checkpoint_1, 2pc_100 won't be copied to disk because its prepare_end_lsn > checkpoint_1.redo;5、checkPoint_1 is finished, after checkpoint_timeout, start creating checkpoint_2;6、during checkpoint_2, data of 2pc_100 will be copied to disk;7、before UpdateControlFile() of checkpoint_2, crash happened;8、during crash recovery, redo will start from checkpoint_1, and 2pc_100 will be restored first by restoreTwoPhaseData() because xid_100 < checkPoint_1.nextXid, which is 101;9、because prepare_start_lsn of 2pc_100 > checkpoint_1.redo, 2pc_100 will be added again by xact_redo() during wal replay, resulting in the same 2pc data being added twice;10、In RecoverPreparedTransactions() -> lock_twophase_recover(), lock the same 2pc will cause panic.Is the above scenario reasonable, and do you have any good ideas for fixing this bug?Thanks & Best Regard
Вложения
В списке pgsql-hackers по дате отправления: