ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set

Поиск
Список
Период
Сортировка
От Andres Freund
Тема ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set
Дата
Msg-id 20190724232439.lpxzjw2jg3ukgcqn@alap3.anarazel.de
обсуждение исходный текст
Ответы Re: ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Hi,


Scenario is a very plain upsert:

CREATE TABLE upsert(key int primary key);
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT 0 1
INSERT 0 1

postgres[8755][1]=# SELECT page_items.* FROM generate_series(0, pg_relation_size('upsert'::regclass::text) / 8192 - 1)
blkno,get_raw_page('upsert'::regclass::text, blkno::int4) AS raw_page, heap_page_items(raw_page) as page_items;
 

┌────┬────────┬──────────┬────────┬──────────┬──────────┬──────────┬────────┬─────────────┬────────────┬────────┬────────┬────────┬────────────┐
│ lp │ lp_off │ lp_flags │ lp_len │  t_xmin  │  t_xmax  │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │
t_bits│ t_oid  │   t_data   │
 

├────┼────────┼──────────┼────────┼──────────┼──────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼────────────┤
│  1 │   8160 │        1 │     28 │ 19742591 │ 19742592 │        0 │ (0,2)  │       24577 │        256 │     24 │
(null)│ (null) │ \x01000000 │
 
│  2 │   8128 │        1 │     28 │ 19742592 │ 19742592 │        0 │ (0,2)  │       32769 │       8336 │     24 │
(null)│ (null) │ \x01000000 │
 

└────┴────────┴──────────┴────────┴──────────┴──────────┴──────────┴────────┴─────────────┴────────────┴────────┴────────┴────────┴────────────┘
(2 rows)

as you can see the same xmax is set for both row version, with the new
infomask being  HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED.


The reason that happens is that ExecOnConflictUpdate() needs to lock the
row to be able to reliably compute a new tuple version based on that
row. heap_update() then decides it needs to carry that xmax forward, as
it's a valid lock:


        /*
         * If the tuple we're updating is locked, we need to preserve the locking
         * info in the old tuple's Xmax.  Prepare a new Xmax value for this.
         */
        compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
                                                          oldtup.t_data->t_infomask,
                                                          oldtup.t_data->t_infomask2,
                                                          xid, *lockmode, true,
                                                          &xmax_old_tuple, &infomask_old_tuple,
                                                          &infomask2_old_tuple);

        /*
         * And also prepare an Xmax value for the new copy of the tuple.  If there
         * was no xmax previously, or there was one but all lockers are now gone,
         * then use InvalidXid; otherwise, get the xmax from the old tuple.  (In
         * rare cases that might also be InvalidXid and yet not have the
         * HEAP_XMAX_INVALID bit set; that's fine.)
         */
        if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
                HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
                (checked_lockers && !locker_remains))
                xmax_new_tuple = InvalidTransactionId;
        else
                xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);

but we really don't need to do any of that in this case - the only
locker is the current backend, after all.

I think this isn't great, because it'll later will cause unnecessary
hint bit writes (although ones somewhat likely combined with setting
XMIN_COMMITTED), and even extra work for freezing.

Based on a quick look this wasn't the case before the finer grained
tuple locking - which makes sense, there was no cases where locks would
need to be carried forward.


It's worthwhile to note that this *nearly* already works, because of the
following codepath in compute_new_xmax_infomask():

         * If the lock to be acquired is for the same TransactionId as the
         * existing lock, there's an optimization possible: consider only the
         * strongest of both locks as the only one present, and restart.
         */
        if (xmax == add_to_xmax)
        {
            /*
             * Note that it's not possible for the original tuple to be updated:
             * we wouldn't be here because the tuple would have been invisible and
             * we wouldn't try to update it.  As a subtlety, this code can also
             * run when traversing an update chain to lock future versions of a
             * tuple.  But we wouldn't be here either, because the add_to_xmax
             * would be different from the original updater.
             */
            Assert(HEAP_XMAX_IS_LOCKED_ONLY(old_infomask));

            /* acquire the strongest of both */
            if (mode < old_mode)
                mode = old_mode;
            /* mustn't touch is_update */

            old_infomask |= HEAP_XMAX_INVALID;
            goto l5;
        }

which set HEAP_XMAX_INVALID for old_infomask, which would then trigger
the code above not carrying forward xmax to the new tuple - but
compute_new_xmax_infomask() operates on a copy of old_infomask.


Note that this contradict comments in heap_update() itself:

         * If we found a valid Xmax for the new tuple, then the infomask bits
         * to use on the new tuple depend on what was there on the old one.
         * Note that since we're doing an update, the only possibility is that
         * the lockers had FOR KEY SHARE lock.
         */

which seems to indicate that this behaviour wasn't forseen.


I find the separation of concerns (and variable naming) between
computations happening in heap_update() itself, and
compute_new_xmax_infomask() fairly confusing and redundant.

I mean, compute_new_xmax_infomask() expands multis, builds a new one
with the updater added. Then re-expands it via GetMultiXactIdHintBits(),
to compute infomask bits for the old tuple. Then returns. Just for
heap_update() to re-re-expand the multi to compute the infomask bits for
the new tuple, again with GetMultiXactIdHintBits()?  I know that there's
a cache, but still. That's some seriously repetitive work.

Oh, and the things GetMultiXactIdHintBits() returns imo aren't really
well described with hint bits.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Statistical aggregate functions are not working with PARTIAL aggregation
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Statistical aggregate functions are not working with PARTIALaggregation