Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Дата
Msg-id CAH2-WznUEgHVGuMp5cRC9YnrNKGzfORZOUmmu1ijaNEn-sQyYg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-bugs
On Thu, Nov 11, 2021 at 8:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Nov 11, 2021 at 4:58 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > What prevents the scenario that some other backend e.g. has a snapshot with
> > > xmin=xmax=RECENTLY_DEAD-row. If the RECENTLY_DEAD row has an xid that is later
> > > than the DEAD row, this afaict would make it perfectly legal to prune the DEAD
> > > row, but *not* the RECENTLY_DEAD one.
> >
> > I'll need to think about this very carefully. I didn't think it was
> > worth blocking v3 on, though naturally it's a big concern.
>
> If we're to traverse HOT chains right to the end in
> heap_prune_chain(), reading even LIVE tuples (per the approach
> proposed in my bugfix patch), we probably need to be more careful
> about concurrently aborted xacts -- relying on the usual
> !HeapTupleHeaderIsHotUpdated(htup) test doesn't seem safe.

I wonder if we're approaching this business with "RECENTLY_DEAD can be
upgraded to DEAD" in entirely the wrong way. Why not just not do that
at all anymore, on the off chance that it's unsafe? Why even take a
small chance? Our decision has to work at the level of the whole
entire HOT chain, and it seems to me that we should make that as
simple as possible.

I'm pretty sure that we're giving up nothing this way. We can just
take the conservative position that it's never okay for
heap_prune_chain() to consider a heap-only tuple from a validated HOT
chain DEAD, unless A.) HTSV says it is DEAD when asked, and B.) HTSV
said the same thing (tuple DEAD) about any and all earlier heap-only
tuples in the chain. Moreover, if HTSV says
LIVE/RECENTLY_DEAD/whatever about one tuple in the chain, then we
refuse to treat *ALL* successor tuples in the same chain as DEAD --
even when HTSV says that they're DEAD directly. As long as these
DEAD-to-HTSV heap-only tuples appear to be from the same original HOT
chain, we don't need to change our mind about the HOT chain.

The original structure of heap_prune_chain() from the HOT commit in
2007 had more or less the same code and structure as it has now, but
almost everything was in a critical structure -- the state arrays and
so on only came about a year later, in commit 6f10eb2111. The proposed
bug fix more or less finishes the work of the second commit, which
didn't go far enough. As long as we are starting by building a
consistent picture of valid HOT chains on the page, and only later
handle disconnected heap-only tuples, everything works out.

-- 
Peter Geoghegan



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: BUG #17280: global-buffer-overflow on select from pg_stat_slru