Re: pgsql: Fast promote mode skips checkpoint at end of recovery.
От | Fujii Masao |
---|---|
Тема | Re: pgsql: Fast promote mode skips checkpoint at end of recovery. |
Дата | |
Msg-id | CAHGQGwHcgrkO54M2VvzZFTmkQpJMb=aqB_1UhVnGVq_Uzn_Rkg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pgsql: Fast promote mode skips checkpoint at end of recovery. (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: pgsql: Fast promote mode skips checkpoint at end of recovery.
|
Список | pgsql-committers |
On Wed, Jan 30, 2013 at 1:27 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Jan 29, 2013 at 9:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Fast promote mode skips checkpoint at end of recovery. >> pg_ctl promote -m fast will skip the checkpoint at end of recovery so that we >> can achieve very fast failover when the apply delay is low. Write new WAL record >> XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for downstream log >> readers. If we skip synchronous end of recovery checkpoint we request a normal >> spread checkpoint so that the window of re-recovery is low. > > When I tested this feature, I encountered the following FATAL message. > > FATAL: highest timeline 1 of the primary is behind recovery timeline 2 > > Is this an intentional behavior or bug? What I did in my test is: > > 1. Set up one master (A), one standby (B), one cascade standby (C) > 2. After running pgbench -i -s 10, I promoted the standby (B) with fast mode > 3. Then, I shut down the server (B) with immediate mode after it has been > brought up to the master before end-of-recovery checkpoint has not been > completed. > 4. Restart the server (B). > 5. After the standby (C) established the replication connection with (B), > I got the above FATAL messages repeatedly. > > Promoting (B) increments the timeline ID to 2 and generates the timeline > history file. But after restarting (B), its timeline ID is reset to 1 > unexpectedly. > This seems to be the cause of the problem. > > To address this problem, we should switch to new timeline ID whenever > we read the XLOG_END_OF_RECOVERY even if it's a crash recovery? On second thought, we don't need such a complicated test case to produce the problem which derives from the same cause of reported problem. The procedure to produce the problem is: 1. Set up one master (A) and one standby (B) 2. Promote (B) with fast mode after running pgbench -i -s 10 3. Execute the write transaction on new master (B) 4. Shut down (B) with immediate mode before end-of-recovery checkpoint has been completed 5. Restart (B) Then you can confirm that the write transaction that you executed in #3 has been lost. Regards, -- Fujii Masao
В списке pgsql-committers по дате отправления: