Re: Improving btree performance through specializing by key shape, take 2

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: Improving btree performance through specializing by key shape, take 2
Дата
Msg-id CAEze2WiA7SbY4s5Waj4JV7+yO4YD8GhrZrvYY6xhyan0opNCtg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improving btree performance through specializing by key shape, take 2  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Improving btree performance through specializing by key shape, take 2  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Tue, 19 Sept 2023 at 22:49, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Tue, Sep 19, 2023 at 6:28 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > > To be clear, page deletion does what I described here (it does an
> > > in-place update of the downlink to the deleted page, so the same pivot
> > > tuple now points to its right sibling, which is our page of concern),
> > > in addition to fully removing the original pivot tuple whose downlink
> > > originally pointed to our page of concern. This is why page deletion
> > > makes the key space "move to the right", very much like a page split
> > > would.
> >
> > I am still aware of this issue, and I think we've discussed it in
> > detail earlier. I think it does not really impact this patchset. Sure,
> > I can't use dynamic prefix compression to its full potential, but I
> > still do get serious performance benefits:
>
> Then why have you linked whatever the first patch does with the high
> key to dynamic prefix compression in the first place? Your commit
> message makes it sound like it's a way to get around the race
> condition that affects dynamic prefix compression, but as far as I can
> tell it has nothing whatsoever to do with that race condition.

We wouldn't have to store the downlink's right separator and compare
it to the highkey if we didn't deviate from L&Y's algorithm for DELETE
operations (which causes the race condition): just the right sibling's
block number would be enough.

(Yes, the right sibling's block number isn't available for the
rightmost downlink of a page. In those cases, we'd have to reuse the
parent page's high key with that of the downlink page, but I suppose
that'll be relatively rare).

> Questions:
>
> 1. Why shouldn't the high key thing be treated as an unrelated piece of work?

Because it was only significant and relatively visible after getting
rid of the other full key compare operations, and it touches
essentially the same areas. Splitting them out in more patches would
be a hassle.

> I guess it's possible that it really should be structured that way,
> but even then it's your responsibility to make it clear why that is.

Sure. But I think I've made that clear upthread too.

> As things stand, this presentation is very confusing.

I'll take a look at improving the presentation.

> 2. Separately, why should dynamic prefix compression be tied to the
> specialization work? I also see no principled reason why it should be
> tied to the other two things.

My performance results show that insert performance degrades by 2-3%
for single-column indexes if only dynamic the prefix truncation patch
is applied [0]. The specialization patches fix that regression on my
machine (5950x) due to having optimized code for the use case. I can't
say for certain that other machines will see the same results, but I
think results will at least be similar.

> I didn't mind this sort of structure so much back when this work was
> very clearly exploratory -- I've certainly structured work in this
> area that way myself, in the past. But if you want this patch set to
> ever go beyond being an exploratory patch set, something has to
> change.

I think it's fairly complete, and mostly waiting for review.

> I don't have time to do a comprehensive (or even a fairly
> cursory) analysis of which parts of the patch are helping, and which
> are marginal or even add no value.

It is a shame that you don't have the time to review this patch.

> > > You'd have
> > > to compare the lower bound separator key from the parent (which might
> > > itself be the page-level low key for the parent) to the page low key.
> > > That's not a serious suggestion; I'm just pointing out that you need
> > > to be able to compare like with like for a canary condition like this
> > > one, and AFAICT there is no lightweight practical way of doing that
> > > that is 100% robust.
> >
> > True, if we had consistent LOWKEYs on pages, that'd make this job much
> > easier: the prefix could indeed be carried over in full. But that's
> > not currently the case for the nbtree code, and this is the next best
> > thing, as it also has the benefit of working with all currently
> > supported physical formats of btree indexes.
>
> I went over the low key thing again because I had to struggle to
> understand what your high key optimization had to do with dynamic
> prefix compression. I'm still struggling. I think that your commit
> message very much led me astray. Quoting it here:
>
> """
> Although this limits [...] relatively expensive _bt_compare.
> """
>
> You're directly tying the high key optimization to the dynamic prefix
> compression optimization. But why?

The value of skipping the _bt_compare call on the highkey is
relatively much higher in the prefix-skip case than it is on master,
as on master it's only one of the log(n) _bt_compare calls on the
page, while in the patch it's one of (on average) 3 full key
_bt_compare calls. This makes it much easier to prove the performance
gain, which made me integrate it into that patch instead of keeping it
separate.

> I have long understood that you gave up on the idea of keeping the
> bounds across levels of the tree (which does make sense to me), but
> yesterday the issue became totally muddled by this high key business.
> That's why I rehashed the earlier discussion, which I had previously
> understood to be settled.

Understood. I'll see if I can improve the wording to something that is
more clear about what the optimization entails.

I'm planning to have these documentation changes to be included in the
next revision of the patchset, which will probably also reduce the
number of specialized functions (and with it the size of the binary).
It will take some extra time, because I would need to re-run the
performance suite, but the code changes should be very limited when
compared to the current patch (apart from moving code between .c and
_spec.c).

---

The meat of the changes are probably in 0001 (dynamic prefix skip),
0003 (change attribute iteration code to use specializable macros),
and 0006 (index attribute iteration for variable key offsets). 0002 is
mostly mechanical code movement, 0004 is a relatively easy
implementation of the iteration functionality for single-key-column
indexes, and 0005 adds an instrument for improving the efficiency of
attcacheoff by implementing negatively cached values ("cannot be
cached", instead of just "isn't cached") which are then used in 0006.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0]
https://www.postgresql.org/message-id/CAEze2Wh_3%2B_Q%2BBefaLrpdXXR01vKr3R2R%3Dh5gFxR%2BU4%2B0Z%3D40w%40mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
Следующее
От: vignesh C
Дата:
Сообщение: Re: Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE