Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Дата
Msg-id CAEze2WhjFd8+BvWG-Wx3AEaTjO=V5Yo_aee=wpPe65rFcKdQCg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Wed, 26 Jul 2023 at 15:42, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Jul 26, 2023 at 5:29 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > Considering that it caches/reuses the page across SAOP operations, can
> > (or does) this also improve performance for index scans on the outer
> > side of a join if the order of join columns matches the order of the
> > index?
>
> It doesn't really cache leaf pages at all. What it does is advance the
> array keys locally, while the original buffer lock is still held on
> that same page.

Hmm, then I had a mistaken understanding of what we do in _bt_readpage
with _bt_saveitem.

> > That is, I believe this caches (leaf) pages across scan keys, but can
> > (or does) it also reuse these already-cached leaf pages across
> > restarts of the index scan/across multiple index lookups in the same
> > plan node, so that retrieval of nearby index values does not need to
> > do an index traversal?
>
> I'm not sure what you mean. There is no reason why you need to do more
> than one single descent of an index to scan many leaf pages using many
> distinct sets of array keys. Obviously, this depends on being able to
> observe that we really don't need to redescend the index to advance
> the array keys, again and again. Note in particularly that this
> usually works across leaf pages.

In a NestedLoop(inner=seqscan, outer=indexscan), the index gets
repeatedly scanned from the root, right? It seems that right now, we
copy matching index entries into a local cache (that is deleted on
amrescan), then we drop our locks and pins on the buffer, and then
start returning values from our local cache (in _bt_saveitem).
We could cache the last accessed leaf page across amrescan operations
to reduce the number of index traversals needed when the join key of
the left side is highly (but not necessarily strictly) correllated.
The worst case overhead of this would be 2 _bt_compares (to check if
the value is supposed to be fully located on the cached leaf page)
plus one memcpy( , , BLCKSZ) in the previous loop. With some smart
heuristics (e.g. page fill factor, number of distinct values, and
whether we previously hit this same leaf page in the previous scan of
this Node) we can probably also reduce this overhead to a minimum if
the joined keys are not correllated, but accellerate the query
significantly when we find out they are correllated.

Of course, in the cases where we'd expect very few distinct join keys
the planner would likely put a Memoize node above the index scan, but
for mostly unique join keys I think this could save significant
amounts of time, if only on buffer pinning and locking.

I guess I'll try to code something up when I have the time, as it
sounds not quite exactly related to your patch but an interesting
improvement nonetheless.


Kind regards,

Matthias van de Meent



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Aleksander Alekseev
Дата:
Сообщение: Re: [PATCH] Check more invariants during syscache initialization
Следующее
От: Jacob Champion
Дата:
Сообщение: Re: [PoC] Federated Authn/z with OAUTHBEARER