Re: [HACKERS] Potential data loss of 2PC files
От | Michael Paquier |
---|---|
Тема | Re: [HACKERS] Potential data loss of 2PC files |
Дата | |
Msg-id | CAB7nPqSV2w9DJ+PHH+vAvkF_mJ2nPbd=_fc5pQp3wOm4owgBNA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Potential data loss of 2PC files (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Ответы |
Re: [HACKERS] Potential data loss of 2PC files
|
Список | pgsql-hackers |
On Fri, Dec 30, 2016 at 5:20 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > As per the prologue of the function, it doesn't expect any 2PC files > to be written out in the function i.e. between two checkpoints. Most > of those are created and deleted between two checkpoints. Same would > be true for recovery as well. Thus in most of the cases we shouldn't > need to flush the two phase directory in this function whether during > normal operation or during the recovery. So, we should avoid flushing > repeatedly when it's not needed. I agree that serialized_xacts > 0 is > not the right condition during recovery on standby to flush the two > phase directory. This is assuming that 2PC transactions are not long-lived, which is likely true for anything doing sharding, like XC, XL or Citus (?). So yes that's true to expect that. > During crash recovery, 2PC files are present on the disk, which means > the two phase directory has correct record of it. This record can not > change. So, we shouldn't need to flush it again. If that's true > serialized_xacts will be 0 during recovery thus serialized_xacts > 0 > condition will still hold. > > On a standby however we will have to flush the two phase directory as > part of checkpoint if there were any files left behind in that > directory. We need a different condition there. Well, flushing the meta-data of pg_twophase is really going to be far cheaper than the many pages done until CheckpointTwoPhase is reached. There should really be a check on serialized_xacts for the non-recovery code path, but considering how cheap that's going to be compared to the rest of the restart point stuff it is not worth the complexity of adding a counter, for example in shared memory with XLogCtl (the counter gets reinitialized at each checkpoint, incremented when replaying a 2PC prepare, decremented with a 2PC commit). So to reduce the backpatch impact I would just do the fsync if (serialized_xact > 0 || RecoveryInProgress()) and call it a day. -- Michael
В списке pgsql-hackers по дате отправления: