Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Дата
Msg-id CAH2-WzkPb42odWen-HZRYw1zUe=fPAp1_JbSuVvPXQTO2iBMPw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Sat, Feb 19, 2022 at 3:08 PM Peter Geoghegan <pg@bowt.ie> wrote:
> It's quite possible that this is nothing more than a bug in my
> adversarial gizmo patch -- since I don't think that
> ConditionalLockBufferForCleanup() can ever fail with a temp buffer
> (though even that's not completely clear right now). Even if the
> behavior that I saw does not indicate a bug on HEAD, it still seems
> informative.

This very much looks like a bug in pg_surgery itself now -- attached
is a draft fix.

The temp table thing was a red herring. I found I could get exactly
the same kind of failure when htab2 was a permanent table (which was
how it originally appeared, before commit 0811f766fd made it into a
temp table due to test flappiness issues). The relevant "vacuum freeze
htab2" happens at a point after the test has already deliberately
corrupted one of its tuples using heap_force_kill().  It's not that we
aren't careful enough about the corruption at some point in
vacuumlazy.c, which was my second theory. But I quickly discarded that
idea, and came up with a third theory: the relevant heap_surgery.c
path does the relevant ItemIdSetDead() to kill items, without also
defragmenting the page to remove the tuples with storage, which is
wrong.

This meant that we depended on pruning happening (in this case during
VACUUM) and defragmenting the page in passing. But there is no reason
to not defragment the page within pg_surgery (at least no obvious
reason), since we have a cleanup lock anyway.

Theoretically you could blame this on lazy_scan_noprune instead, since
it thinks it can collect LP_DEAD items while assuming that they have
no storage, but that doesn't make much sense to me. There has never
been any way of setting a heap item to LP_DEAD without also
defragmenting the page.  Since that's exactly what it means to prune a
heap page. (Actually, the same used to be true about heap vacuuming,
which worked more like heap pruning before Postgres 14, but that
doesn't seem important.)

-- 
Peter Geoghegan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joseph Koshakow
Дата:
Сообщение: Re: Fix overflow in DecodeInterval
Следующее
От: Andres Freund
Дата:
Сообщение: Re: initdb / bootstrap design