Tom Lane wrote:
> Aside from the "how did this happen" puzzle, the real point of any
> investigation of course ought to be whether we can make heap_page_prune
> more robust. At the very least it's undesirable to be leaving the page
> in a state where VACUUM FULL will decide it can't shrink.
I'm as puzzled as you are on how it happened.
The fundamental problem here is that we have HOT-updated tuples that are
missing the root tuple. heap_prune_chain doesn't know how to remove the
dead tuples from such chains, and neither does vacuum. What's worse is
that there's no index pointer to the live tuple in the chain, and even
VACUUM FULL doesn't fix that.
Could we modify heap_page_prune so that it detects such orphaned HOT
tuples, and clears the HOT_UPDATED flag for them? That would at least
let you recover. I'm pretty wary of trying to "fix" things after
corruption in general since you can't be sure what's correct and what's
not, but clearing the HOT_UPDATED flag seems safe and more likely to
help than hurt. Clearing the flag would let the tuple become indexed
again in a VACUUM FULL or REINDEX, so it would fix the inconsistency
that otherwise a sequential scan can see the tuple but an index scan can
not. It would also allow the next heap_page_prune operation to remove
the remaining dead tuples in the chain.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com