Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs
От | David Rowley |
---|---|
Тема | Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs |
Дата | |
Msg-id | CAApHDvri53PPF76c3M94_QNWbJfXjyCnjXuj_2=LYM-0m8WZtw@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs
|
Список | pgsql-hackers |
In [1] there was a report that set operations didn't correctly detect when inputs were provably empty sets. While this is not the bug that the report claimed it to be, as it's just a missing optimisation, I did decide to look at it to check if there was much performance to gain from doing this. The short of it is, Yes, there are cases when this can help query performance. Primarily, this seems to come from when the code detects that an EXCEPT ALL has an empty right-hand input. In this case, we can scan the left-hand input and forego the SetOp node completely. With EXCEPT (without ALL), deduplication is still required, however that can be done with an Aggregate node on the left input rather than using the slightly less efficient SetOp node. If I create two tables with a single int column containing 1 million rows each, ANALYZE them and run some queries with and without the patch, I see: (work_mem=256MB, pgbench -M simple -T 10) master: EXCEPT ALL left dummy : tps = 8466.587802 EXCEPT ALL right dummy : tps = 3.160083 EXCEPT left dummy : tps = 8433.607519 EXCEPT right dummy : tps = 3.178104 INTERSECT (all types) : tps = 8392.695606 UNION left dummy : tps = 3.406355 patched: EXCEPT ALL left dummy : tps = 8973.958896 (+5.99%) EXCEPT ALL right dummy : tps = 53.583312 (+1595.63%) EXCEPT left dummy : tps = 8736.716176 (+3.59%) EXCEPT right dummy : tps = 3.385520 (+6.53%) INTERSECT (all types) : tps = 8759.123942 (+4.37%) UNION left dummy : tps = 3.590264 (+5.40%) You can see EXCEPT ALL with the empty right-hand input became ~15x faster, and all the others became ~5% faster. There are some additional benefits aside from the performance as it's possible to provide better row estimates in certain cases. For example, if a UNION query removes all apart from 1 input, we can do estimate_num_groups() on that input. Otherwise, we're left to the assumption that all rows are unique, which certainly could cause some trouble later in planning for queries consuming the results of set operations in subqueries. EXCEPT with an empty right-hand input also benefits from improved row estimates for the same reason. For me, this seems worth doing. Set operations have been drawn out of the dark ages with the last few releases, and I feel this makes them more aligned to the set of optimisations we've come to expect in other parts of the planner. I'm happy to hear other opinions. Patch attached. David [1] https://postgr.es/m/18904-c5fea7892f4d26ed@postgresql.org
Вложения
В списке pgsql-hackers по дате отправления: