Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От Alena Rybakina
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id cc74aa7d-0911-4873-87d2-9405ece0ec32@postgrespro.ru
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers

Hi!

On 27.10.2023 00:04, Peter Geoghegan wrote:
On Thu, Oct 26, 2023 at 12:59 PM Robert Haas <robertmhaas@gmail.com> wrote:
Alexander's example seems to show that it's not that simple. If I'm
reading his example correctly, with things like aid = 1, the
transformation usually wins even if the number of things in the OR
expression is large, but with things like aid + 1 * bid = 1, the
transformation seems to lose at least with larger numbers of items. So
it's not JUST the number of OR elements but also what they contain,
unless I'm misunderstanding his point.
Alexander said "Generally, I don't see why ANY could be executed
slower than the equivalent OR clause". I understood that this was his
way of expressing the following idea:

"In principle, there is no reason to expect execution of ANY() to be
slower than execution of an equivalent OR clause (except for
noise-level differences). While it might not actually look that way
for every single type of plan you can imagine right now, that doesn't
argue for making a cost-based decision. It actually argues for fixing
the underlying issue, which can't possibly be due to some kind of
fundamental advantage enjoyed by expression evaluation with ORs".

This is also what I think of all this.

Alexander's partial index example had this quality to it. Obviously,
the planner *could* be taught to do the right thing with such a case,
with a little more work. The fact that it doesn't right now is
definitely a problem, and should probably be treated as a blocker for
this patch. But that doesn't really argue against the general idea
behind the patch -- it just argues for fixing that one problem.

There may also be a separate problem that comes from the added planner
cycles required to do the transformation -- particularly in extreme or
adversarial cases. We should worry about that, too. But, again, it
doesn't change the basic fact, which is that having a
standard/normalized representation of OR lists/DNF transformation is
extremely useful in general, and *shouldn't* result in any real
slowdowns at execution time if done well.
I think it would be more correct to finalize the current approach to converting "OR" expressions to "ANY", since quite a few problems related to this patch have already been found here, I think you can solve them first, and then you can move on.

To me, this sort of output suggests that perhaps the transformation is
being done in the wrong place. I expect that we have to decide whether
to convert from OR to = ANY(...) at a very early stage of the planner,
before we have any idea what the selected path will ultimately be. But
this output suggests that we want the answer to depend on which kind
of path is going to be faster, which would seem to argue for doing
this sort of transformation as part of path generation for only those
paths that will benefit from it, rather than during earlier phases of
expression processing/simplification.
I don't think that that's the right direction. They're semantically
equivalent things. But a SAOP-based plan can be fundamentally better,
since SAOPs enable passing down useful context to index AMs (at least
nbtree). And because we can use a hash table for SAOP expression
evaluation. It's a higher level, standardized, well optimized way of
expressing exactly the same concept.

I can come up with a case that'll be orders of magnitude more
efficient with this patch, despite the transformation process only
affecting a small OR list of 3 or 5 elements -- a 100x reduction in
heap page accesses is quite possible. This is particularly likely to
come up if you assume that the nbtree patch that I'm currently working
on is also available. In general, I think that we totally over-rely on
bitmap index scans, especially BitmapOrs.


Regarding the application of the transformation at an early stage, the patch is almost ready, except for solving cases related to queries that work slower. I haven't figured out how to exclude such requests without comparing the cost or parameter by the number of OR elements yet. The simplest option is not to process Expr types (already mentioned earlier) in the queries that Alexander gave as an example, but as I already said, I don't like this approach very much.
-- 
Regards,
Alena Rybakina
Postgres Professional

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dean Rasheed
Дата:
Сообщение: Re: Infinite Interval
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_dump not dumping the run_as_owner setting from version 16?