Re: Race condition in recovery?
От | Dilip Kumar |
---|---|
Тема | Re: Race condition in recovery? |
Дата | |
Msg-id | CAFiTN-tJ8gKs0+f7wsybdd3dUX73ZxiSEKN9vjso2=GnhgTJjw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Race condition in recovery? (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Список | pgsql-hackers |
On Tue, May 18, 2021 at 12:22 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > And finally I think I could reach the situation the commit wanted to fix. > > I took a basebackup from a standby just before replaying the first > checkpoint of the new timeline (by using debugger), without copying > pg_wal. In this backup, the control file contains checkPointCopy of > the previous timeline. > > I modified StartXLOG so that expectedTLEs is set just after first > determining recoveryTargetTLI, then started the grandchild node. I > have the following error and the server fails to continue replication. > [postmaster] LOG: starting PostgreSQL 14beta1 on x86_64-pc-linux-gnu... > [startup] LOG: database system was interrupted while in recovery at log... > [startup] LOG: set expectedtles tli=6, length=1 > [startup] LOG: Probing history file for TLI=7 > [startup] LOG: entering standby mode > [startup] LOG: scanning segment 3 TLI 6, source 0 > [startup] LOG: Trying fetching history file for TLI=6 > [walreceiver] LOG: fetching timeline history file for timeline 5 from pri... > [walreceiver] LOG: fetching timeline history file for timeline 6 from pri... > [walreceiver] LOG: started streaming ... primary at 0/3000000 on timeline 5 > [walreceiver] DETAIL: End of WAL reached on timeline 5 at 0/30006E0. > [startup] LOG: unexpected timeline ID 1 in log segment 000000050000000000000003, offset 0 > [startup] LOG: Probing history file for TLI=7 > [startup] LOG: scanning segment 3 TLI 6, source 0 > (repeats forever) So IIUC, this logs shows that "ControlFile->checkPointCopy.ThisTimeLineID" is 6 but "ControlFile->checkPoint" record is on TL 5? I think if you had the old version of the code (before the commit) or below code [1], right after initializing expectedTLEs then you would have hit the FATAL the patch had fix. While debugging did you check what was the "ControlFile->checkPoint" LSN vs the first LSN of the first segment with TL6? expectedTLEs = readTimeLineHistory(recoveryTargetTLI); [1] if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) != ControlFile->checkPointCopy.ThisTimeLineID) { report(FATAL.. } -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: