Re: Fix primary crash continually with invalid checkpoint after promote
От | Kyotaro Horiguchi |
---|---|
Тема | Re: Fix primary crash continually with invalid checkpoint after promote |
Дата | |
Msg-id | 20220427.112411.551209151727752749.horikyota.ntt@gmail.com обсуждение исходный текст |
Ответ на | Re: Fix primary crash continually with invalid checkpoint after promote (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Fix primary crash continually with invalid checkpoint after promote
|
Список | pgsql-hackers |
At Tue, 26 Apr 2022 15:47:13 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > "=?ISO-8859-1?B?WmhhbyBSdWk=?=" <875941708@qq.com> writes: > > Newly promoted primary may leave an invalid checkpoint. > > In function CreateRestartPoint, control file is updated and old wals are removed. But in some situations, control fileis not updated, old wals are still removed. Thus produces an invalid checkpoint with nonexistent wal. Crucial log: "invalidprimary checkpoint record", "could not locate a valid checkpoint record". > > I believe this is the same issue being discussed here: > > https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota.ntt%40gmail.com > > but Horiguchi-san's proposed fix looks quite different from yours. The root cause is that CreateRestartPoint omits to update last checkpoint in control file if archiver recovery exits at an unfortunate timing. So my proposal is going to fix the root cause. Zhao Rui's proposal is retension of WAL files according to (the wrong content of) control file. Aside from the fact that it may let slots be invalidated ealier, It's not great that an acutally performed restartpoint is forgotten, which may cause the next crash recovery starts from an already performed checkpoint. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: