Re: BUG #17964: Missed query planner optimization
От | David Rowley |
---|---|
Тема | Re: BUG #17964: Missed query planner optimization |
Дата | |
Msg-id | CAApHDvq_5v=G=9aS3QQwRUH-h1nf09Ycnzh=GQeDJtsYsQvd_g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17964: Missed query planner optimization (Mathias Kunter <mathiaskunter@gmail.com>) |
Ответы |
Re: BUG #17964: Missed query planner optimization
|
Список | pgsql-bugs |
On Wed, 7 Jun 2023 at 23:48, Mathias Kunter <mathiaskunter@gmail.com> wrote: > Yes, it's not a bug, but it's something which can be improved. If I > simply change the original query from this: > > > SELECT ... WHERE id IN (SELECT ...); > > into this: > > > SELECT ... WHERE id = ANY(ARRAY(SELECT ...)); > > then Postgres uses an index scan, and the query is orders of magnitude > faster. Note that the planner actually correctly computes the estimated > costs for both variants, since I get: What's going on here is that there is code which will convert supported IN clauses into semi-joins. The first of your queries has this done, but the 2nd query does not. The 2nd query, since the semi-join conversion is not done, the qual later becomes eligible to be pushed down into the union subquery meaning the non-matching rows get filtered before the UNION is evaluated. We'll never attempt to push joins (semi or otherwise) down into subqueries, so this is not done for the first query. The main problem here is that in some cases the first of your queries will be faster and in other cases the 2nd will be faster. It'll depend on how many rows are in each table. So, really to find the fastest plan, the planner would have to consider both options. That would unfortunately mean that we'd have to perform the join search once without the semi-join pushed down and again with the semi-join pushed down. The join search is going to be the slowest part of planning when there are a few tables to join, so doing that twice could add quite a bit of overhead to the planner. You might also then consider how many times you'd need to perform the join search if there were, say, 5 IN clauses. To exhaustively find the best plan we'd need to try all possible combinations of converting each IN clause to a semi-join vs leaving the qual alone. If the main query already had, say 5 tables to join then that suddenly becomes a hugely costly query to plan. Given that, I'm not all that sure you're likely to see us making any improvements here. I suggest just rewriting the query in a way that it executes more quickly for you. David
В списке pgsql-bugs по дате отправления: