Re: Race condition in recovery?
От | Dilip Kumar |
---|---|
Тема | Re: Race condition in recovery? |
Дата | |
Msg-id | CAFiTN-sc+81KjM+ecpnd4jvPv0WQNdNpVZ+uyk2PEYJZpSLthQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Race condition in recovery? (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Список | pgsql-hackers |
On Fri, May 21, 2021 at 7:51 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > https://www.postgresql.org/message-id/50E43C57.5050101%40vmware.com > > > That leaves one case not covered: If you take a backup with plain > > "pg_basebackup" from a standby, without -X, and the first WAL segment > > contains a timeline switch (ie. you take the backup right after a > > failover), and you try to recover from it without a WAL archive, it > > doesn't work. This is the original issue that started this thread, > > except that I used "-x" in my original test case. The problem here is > > that even though streaming replication will fetch the timeline history > > file when it connects, at the very beginning of recovery, we expect that > > we already have the timeline history file corresponding the initial > > timeline available, either in pg_xlog or the WAL archive. By the time > > streaming replication has connected and fetched the history file, we've > > already initialized expectedTLEs to contain just the latest TLI. To fix > > that, we should delay calling readTimeLineHistoryFile() until streaming > > replication has connected and fetched the file. > > If the first segment read by recovery contains a timeline switch, the first > > pages have older timeline than segment timeline. However, if > > exepectedTLEs contained only the segment timeline, we cannot know if > > we can use the record. In that case the following error is issued. > > If expectedTLEs is initialized with the pseudo list, > tliOfPointInHistory always return the recoveryTargetTLI regardless of > the given LSN so the checking about timelines later doesn't work. And > later ReadRecord is surprised to see a page of an unknown timeline. From this whole discussion (on the thread given by you), IIUC the issue was that if the checkpoint LSN does not exist on the "ControlFile->checkPointCopy.ThisTimeLineID". If that is true then I agree that we will just initialize expectedTLE based on the online entry (ControlFile->checkPointCopy.ThisTimeLineID) and later we will fail to find the checkpoint record on this timeline because the checkpoint LSN is smaller than the start LSN of this timeline. Right? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: