Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Дата
Msg-id CAH2-Wz=nJ93NtbqT966_gmpVRekd4=Y9Lngv1fMNzfeVwdM+5g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-hackers
On Thu, Jun 10, 2021 at 8:49 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Could you elaborate on what this "matches what we expect" entails?
>
> Apart from this, I'm also quite certain that the goto-branch that
> created this infinite loop should have been dead code: In a correctly
> working system, the GlobalVis*Rels should always be at least as strict
> as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels
> can be updated (i.e. move their horizon forward) during the vacuum. As
> such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple
> that also satisifies the condition of HeapTupleSatisfiesVacuum.

It's true that these two similar functions should be in perfect
agreement in general (given the same OldestXmin). That in itself
doesn't mean that they must always agree about a tuple in practice,
when they're called in turn inside lazy_scan_prune(). In particular,
nothing stops a transaction that was in progress to
heap_prune_satisfies_vacuum (when it saw some tuples it inserted)
concurrently aborting. That will render the same tuples fully DEAD
inside HeapTupleSatisfiesVacuum(). So we need to restart using the
goto purely to cover that case. See the commit message of commit
8523492d4e3.

By "matches what we expect", I meant "involves a just-aborted
transaction". We could defensively verify that the inserting
transaction concurrently aborted at the point of retrying/calling
heap_page_prune() a second time. If there is no aborted transaction
involved (as was the case with this bug), then we can be confident
that something is seriously broken.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: "an SQL" vs. "a SQL"
Следующее
От: Tom Lane
Дата:
Сообщение: Re: "an SQL" vs. "a SQL"