Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Дата
Msg-id CAH2-Wz=6vQyS5YhkZJV1gXjEF=TFEh2gxB0kjxDuZLgbC98KOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-hackers
On Wed, Jul 26, 2023 at 12:07 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> We could cache the last accessed leaf page across amrescan operations
> to reduce the number of index traversals needed when the join key of
> the left side is highly (but not necessarily strictly) correllated.

That sounds like block nested loop join. It's possible that that could
reuse some infrastructure from this patch, but I'm not sure.

In general, SAOP execution/MDAM performs "duplicate elimination before
it reads the data" by sorting and deduplicating the arrays up front.
While my patch sometimes elides a primitive index scan, primitive
index scans are already disjuncts that are combined to create what can
be considered one big index scan (that's how the planner and executor
think of them). The patch takes that one step further by recognizing
that it could quite literally be one big index scan in some cases (or
fewer, larger scans, at least). It's a natural incremental
improvement, as opposed to inventing a new kind of index scan. If
anything the patch makes SAOP execution more similar to traditional
index scans, especially when costing them.

Like InnoDB style loose index scan (for DISTINCT and GROUP BY
optimization), block nested loop join would require inventing a new
type of index scan. Both of these other two optimizations involve the
use of semantic information that spans multiple levels of abstraction.
Loose scan requires duplicate elimination (that's the whole point),
while IIUC block nested loop join needs to "simulate multiple inner
index scans" by deliberately returning duplicates for each would-be
inner index scan. These are specialized things.

To be clear, I think that all of these ideas are reasonable. I just
find it useful to classify these sorts of techniques according to
whether or not the index AM API would have to change or not, and the
general nature of any required changes. MDAM can do a lot of cool
things without requiring any revisions to the index AM API, which
should allow it to play nice with everything else (index path clause
safety issues notwithstanding).

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: Retiring is_pushed_down
Следующее
От: Garrett Thornburg
Дата:
Сообщение: Re: PATCH: Add REINDEX tag to event triggers