Re: BUG #17964: Missed query planner optimization

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: BUG #17964: Missed query planner optimization
Дата
Msg-id CAApHDvq_5v=G=9aS3QQwRUH-h1nf09Ycnzh=GQeDJtsYsQvd_g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17964: Missed query planner optimization  (Mathias Kunter <mathiaskunter@gmail.com>)
Ответы Re: BUG #17964: Missed query planner optimization  (Mathias Kunter <mathiaskunter@gmail.com>)
Список pgsql-bugs
On Wed, 7 Jun 2023 at 23:48, Mathias Kunter <mathiaskunter@gmail.com> wrote:
> Yes, it's not a bug, but it's something which can be improved. If I
> simply change the original query from this:
>
> > SELECT ... WHERE id IN (SELECT ...);
>
> into this:
>
> > SELECT ... WHERE id = ANY(ARRAY(SELECT ...));
>
> then Postgres uses an index scan, and the query is orders of magnitude
> faster. Note that the planner actually correctly computes the estimated
> costs for both variants, since I get:

What's going on here is that there is code which will convert
supported IN clauses into semi-joins. The first of your queries has
this done, but the 2nd query does not.  The 2nd query, since the
semi-join conversion is not done, the qual later becomes eligible to
be pushed down into the union subquery meaning the non-matching rows
get filtered before the UNION is evaluated.  We'll never attempt to
push joins (semi or otherwise) down into subqueries, so this is not
done for the first query.

The main problem here is that in some cases the first of your queries
will be faster and in other cases the 2nd will be faster. It'll depend
on how many rows are in each table. So, really to find the fastest
plan, the planner would have to consider both options.  That would
unfortunately mean that we'd have to perform the join search once
without the semi-join pushed down and again with the semi-join pushed
down.  The join search is going to be the slowest part of planning
when there are a few tables to join, so doing that twice could add
quite a bit of overhead to the planner.   You might also then consider
how many times you'd need to perform the join search if there were,
say, 5 IN clauses.  To exhaustively find the best plan we'd need to
try all possible combinations of converting each IN clause to a
semi-join vs leaving the qual alone.  If the main query already had,
say 5 tables to join then that suddenly becomes a hugely costly query
to plan.  Given that, I'm not all that sure you're likely to see us
making any improvements here.  I suggest just rewriting the query in a
way that it executes more quickly for you.

David



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Mathias Kunter
Дата:
Сообщение: Re: BUG #17964: Missed query planner optimization
Следующее
От: David Rowley
Дата:
Сообщение: Re: BUG #17967: [msvc-x86] Failed to build with error LNK2019: unresolved external symbol __BitScanReverse64 referen