Re: Infinite loop in XLogPageRead() on standby

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Infinite loop in XLogPageRead() on standby
Дата
Msg-id 20240301.131637.563635299005880728.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Infinite loop in XLogPageRead() on standby  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Infinite loop in XLogPageRead() on standby  (Alexander Kukushkin <cyberdemn@gmail.com>)
Список pgsql-hackers
At Fri, 01 Mar 2024 12:37:55 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Anyway, our current policy here is to avoid record-rereads beyond
> source switches. However, fixing this seems to require that source
> switches cause record rereads unless some additional information is
> available to know if replay LSN needs to back up.

It seems to me that the error messages are related to commit 0668719801.

XLogPageRead:
> * Check the page header immediately, so that we can retry immediately if
> * it's not valid. This may seem unnecessary, because ReadPageInternal()
> * validates the page header anyway, and would propagate the failure up to
> * ReadRecord(), which would retry. However, there's a corner case with
> * continuation records, if a record is split across two pages such that
> * we would need to read the two pages from different sources. For
...
>    if (StandbyMode &&
>        !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
>    {
>        /*
>         * Emit this error right now then retry this page immediately. Use
>         * errmsg_internal() because the message was already translated.
>         */
>        if (xlogreader->errormsg_buf[0])
>            ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
>                    (errmsg_internal("%s", xlogreader->errormsg_buf)));

This code intends to prevent a page header error from causing a record
reread, when a record is required to be read from multiple sources. We
could restrict this to only fire at segment boundaries. At segment
boundaries, we won't let LSN back up by using XLP_FIRST_IS_CONTRECORD.

Having thought up to this point, I now believe that we should
completely prevent LSN from going back in any case. One drawback is
that the fix cannot be back-patched.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum
Следующее
От: Peter Smith
Дата:
Сообщение: Re: Synchronizing slots from primary to standby