Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Дата
Msg-id 20240415173913.4zyyrwaftujxthf2@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Noah Misch <noah@leadboat.com>)
Ответы Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-bugs
Hi,

I've tried a couple times to catch up with this thread. But always kinda felt
I must be missing something. It might be that this is one part of the
confusion:

On 2024-01-06 12:24:13 -0800, Noah Misch wrote:
> Fair enough.  While I agree there's a decent chance back-patching would be
> okay, I think there's also a decent chance that 1ccc1e05ae creates the problem
> Matthias theorized.  Something like: we update relfrozenxid based on
> OldestXmin, even though GlobalVisState caused us to retain a tuple older than
> OldestXmin.  Then relfrozenxid disagrees with table contents.

Looking at the state as of 1ccc1e05ae, I don't see how - in lazy_scan_prune(),
if heap_page_prune() spuriously didn't prune a tuple, because the horizon went
backwards, we'd encounter the tuple in the loop below and call
heap_prepare_freeze_tuple(), which would error out with one of

    /*
     * Process xmin, while keeping track of whether it's already frozen, or
     * will become frozen iff our freeze plan is executed by caller (could be
     * neither).
     */
    xid = HeapTupleHeaderGetXmin(tuple);
    if (!TransactionIdIsNormal(xid))
        xmin_already_frozen = true;
    else
    {
        if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
            ereport(ERROR,
                    (errcode(ERRCODE_DATA_CORRUPTED),
                     errmsg_internal("found xmin %u from before relfrozenxid %u",
                                     xid, cutoffs->relfrozenxid)));

or
        if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid))
            ereport(ERROR,
                    (errcode(ERRCODE_DATA_CORRUPTED),
                     errmsg_internal("multixact %u contains update XID %u from before relfrozenxid %u",
                                     multi, update_xact,
                                     cutoffs->relfrozenxid)));
or
        /* Raw xmax is normal XID */
        if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
            ereport(ERROR,
                    (errcode(ERRCODE_DATA_CORRUPTED),
                     errmsg_internal("found xmax %u from before relfrozenxid %u",
                                     xid, cutoffs->relfrozenxid)));


I'm not saying that spuriously erroring out would be ok. But I guess I just
don't understand the data corruption theory in this subthread, because we'd
error out if we encountered a tuple that should have been frozen but wasn't?

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Следующее
От: Andres Freund
Дата:
Сообщение: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae