Re: WAL record CRC calculated incorrectly because of underlying buffer modification

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: WAL record CRC calculated incorrectly because of underlying buffer modification
Дата
Msg-id CA+hUKG+=cb86CYa4W42z4wFBMwjQE2=O9RFC+i4QZuCB+d2p0A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WAL record CRC calculated incorrectly because of underlying buffer modification  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: WAL record CRC calculated incorrectly because of underlying buffer modification
Список pgsql-hackers
On Sat, May 11, 2024 at 5:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> 11.05.2024 07:25, Thomas Munro wrote:
> > On Sat, May 11, 2024 at 4:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> >> 11.05.2024 06:26, Thomas Munro wrote:
> >>> Perhaps a no-image, no-change registered buffer should not be
> >>> including an image, even for XLR_CHECK_CONSISTENCY?  It's actually
> >>> useless for consistency checking too I guess, this issue aside,
> >>> because it doesn't change anything so there is nothing to check.

> >> Yes, I think something wrong is here. I've reduced the reproducer to:

> > Does it reproduce if you do this?
> >
> > -               include_image = needs_backup || (info &
> > XLR_CHECK_CONSISTENCY) != 0;
> > +               include_image = needs_backup ||
> > +                       ((info & XLR_CHECK_CONSISTENCY) != 0 &&
> > +                        (regbuf->flags & REGBUF_NO_CHANGE) == 0);
>
> No, it doesn't (at least with the latter, more targeted reproducer).

OK so that seems like a candidate fix, but ...

> > Unfortunately the back branches don't have that new flag from 00d7fb5e
> > so, even if this is the right direction (not sure, I don't understand
> > this clean registered buffer trick) then ... but wait, why are there
> > are no failures like this in the back branches (yet at least)?  Does
> > your reproducer work for 16?  I wonder if something relevant changed
> > recently, like f56a9def.  CC'ing Michael and Amit K for info.
>
> Maybe it's hard to hit (autovacuum needs to process the index page in a
> narrow time frame), but locally I could reproduce the issue even on
> ac27c74de(~1 too) from 2018-09-06 (I tried several last commits touching
> hash indexes, didn't dig deeper).

... we'd need to figure out how to fix this in the back-branches too.
One idea would be to back-patch REGBUF_NO_CHANGE, and another might be
to deduce that case from other variables.  Let me CC a couple more
people from this thread, which most recently hacked on this stuff, to
see if they have insights:

https://www.postgresql.org/message-id/flat/d2c31606e6bb9b83a02ed4835d65191b38d4ba12.camel%40j-davis.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Weird test mixup
Следующее
От: Noah Misch
Дата:
Сообщение: Re: race condition in pg_class