Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Дата
Msg-id CAEze2Wh_0XJGQ2o+2OBx-G9rJdV+ZqWv+8GqLzg-Hahc3_gvkQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE  (Andres Freund <andres@anarazel.de>)
Ответы Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Mon, 30 Jan 2023 at 21:19, Andres Freund <andres@anarazel.de> wrote:
> On 2023-01-10 21:32:54 +0100, Matthias van de Meent wrote:
> > On Tue, 10 Jan 2023 at 20:14, Andres Freund <andres@anarazel.de> wrote:
> > > On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote:
> > > What precisely do you mean with "skew" here? Do you just mean that it'd take a
> > > long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like
> > > you might mean more than that?
> >
> > h->oldest_considered_running can be extremely old due to the global
> > nature of the value and the potential existence of a snapshot in
> > another database that started in parallel to a very old running
> > transaction.
>
> Here's a version that, I think, does not have that issue.

Thanks!

> In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific
> helper function for this, but it seems likely we'll need it in other places
> too.  So I named it TransactionIdRetreatSafely(). I made it accept the xid by
> pointer, as the line lengths / repetition otherwise end up making it hard to
> read the code.  For now I have TransactionIdRetreatSafely() be private to
> procarray.c, but I expect we'll have to change that eventually.

If TransactionIdRetreatSafely will be exposed outside procarray.c,
then I think the xid pointer should be replaced with normal
arguments/returns; both for parity with TransactionIdRetreatedBy and
to remove this memory store dependency in this hot code path.

> Not sure I like TransactionIdRetreatSafely() as a name. Maybe
> TransactionIdRetreatClamped() is better?

I think the 'safely' version is fine.

> I've been working on a test for vacuum_defer_cleanup_age. It does catch the
> corruption at hand, but not much more.  It's quite painful to write, right
> now.  Some of the reasons:
> https://postgr.es/m/20230130194350.zj5v467x4jgqt3d6%40awork3.anarazel.de
>
>
>
> > > I'm tempted to go with reinterpreting 64bit xids as signed. Except that it
> > > seems like a mighty invasive change to backpatch.
> >
> > I'm not sure either. Protecting against underflow by halving the
> > effective valid value space is quite the intervention, but if it is
> > necessary to make this work in a performant manner, it would be worth
> > it. Maybe someone else with more experience can provide their opinion
> > here.
>
> The attached assertions just removes 1/2**32'ths of the space, by reserving
> the xid range with the upper 32bit set as something that shouldn't be
> reachable.

I think that is acceptible.

> Still requires us to change the input routines to reject that range, but I
> think that's a worthy tradeoff.

Agreed.

> I didn't find the existing limits for the
> type to be documented anywhere.
>
> Obviously something like that could only go into HEAD.

Yeah.

Comments on 0003:

> +        /*
> +         * FIXME, doubtful this is the best fix.
> +         *
> +         * Can't represent the 32bit xid as a 64bit xid, as it's before fxid
> +         * 0. Represent it as an xid from the future instead.
> +         */
> +        if (epoch == 0)
> +            return FullTransactionIdFromEpochAndXid(0, xid);

Shouldn't this be an error condition instead, as this XID should not
be able to appear?

on 0004:

> -       '0xffffffffffffffff'::xid8,
> -       '-1'::xid8;
> +       '0xefffffffffffffff'::xid8,
> +       '0'::xid8;

The 0xFF... usages were replaced with "0xEFFF...". Shouldn't we also
test on 0xffff_fffE_ffff_ffff to test for input of our actual max
value?

> @@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext)
>     while (*str != '\0')
>     {
>         /* read next value */
> -        val = FullTransactionIdFromU64(strtou64(str, &endp, 10));
> +        raw_fxid = strtou64(str, &endp, 10);
> +
> +        val = FullTransactionIdFromU64(raw_fxid);
> +        if (!InFullTransactionIdRange(raw_fxid))
> +            goto bad_format;

With assertions enabled FullTransactionIdFromU64 will assert the
InFullTransactionIdRange condition, meaning we wouldn't hit the branch
into bad_format.
I think these operations should be swapped, as parsing a snapshot
shouldn't run into assertion failures like this if it can error
normally. Maybe this can be added to tests as well?

Kind regards,

Matthias van de Meent



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Melanie Plageman
Дата:
Сообщение: Re: heapgettup() with NoMovementScanDirection unused in core?
Следующее
От: Amit Langote
Дата:
Сообщение: Re: Speed up transaction completion faster after many relations are accessed in a transaction