Re: [BUGS] BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions
От | Marco Nenciarini |
---|---|
Тема | Re: [BUGS] BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions |
Дата | |
Msg-id | 558D58B1.70400@2ndquadrant.it обсуждение исходный текст |
Ответы |
Re: [BUGS] BUG #13473: VACUUM FREEZE mistakenly cancel
standby sessions
|
Список | pgsql-hackers |
Il 26/06/15 15:43, marco.nenciarini@2ndquadrant.it ha scritto: > The following bug has been logged on the website: > > Bug reference: 13473 > Logged by: Marco Nenciarini > Email address: marco.nenciarini@2ndquadrant.it > PostgreSQL version: 9.4.4 > Operating system: all > Description: > > = Symptoms > > Let's have a simple master -> standby setup, with hot_standby_feedback > activated, > if a backend on standby is holding the cluster xmin and the master runs a > VACUUM FREEZE > on the same database of the standby's backend, it will generate a conflict > and the query > running on standby will be canceled. > > = How to reproduce it > > Run the following operation on an idle cluster. > > 1) connect to the standby and simulate a long running query: > > select pg_sleep(3600); > > 2) connect to the master and run the following script > > create table t(id int primary key); > insert into t select generate_series(1, 10000); > vacuum freeze verbose t; > drop table t; > > 3) after 30 seconds the pg_sleep query on standby will be canceled. > > = Expected output > > The hot standby feedback should have prevented the query cancellation > > = Analysis > > Ive run postgres at DEBUG2 logging level, and I can confirm that the vacuum > correctly see the OldestXmin propagated by the standby through the hot > standby feedback. > The issue is in heap_xlog_freeze function, which calls > ResolveRecoveryConflictWithSnapshot as first thing, passing the cutoff_xid > value as first argument. > The cutoff_xid is the OldestXmin active when the vacuum, so it represents a > running xid. > The issue is that the function ResolveRecoveryConflictWithSnapshot expects > as first argument of is latestRemovedXid, which represent the higher xid > that has been actually removed, so there is an off-by-one error. > > I've been able to reproduce this issue for every version of postgres since > 9.0 (9.0, 9.1, 9.2, 9.3, 9.4 and current master) > > = Proposed solution > > In the heap_xlog_freeze we need to subtract one to the value of cutoff_xid > before passing it to ResolveRecoveryConflictWithSnapshot. > > > Attached a proposed patch that solves the issue. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Вложения
В списке pgsql-hackers по дате отправления: