Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

Поиск
Список
Период
Сортировка
От David Rowley
Тема Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs
Дата
Msg-id CAApHDvri53PPF76c3M94_QNWbJfXjyCnjXuj_2=LYM-0m8WZtw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs
Список pgsql-hackers
In [1] there was a report that set operations didn't correctly detect
when inputs were provably empty sets.  While this is not the bug that
the report claimed it to be, as it's just a missing optimisation, I
did decide to look at it to check if there was much performance to
gain from doing this.

The short of it is, Yes, there are cases when this can help query
performance. Primarily, this seems to come from when the code detects
that an EXCEPT ALL has an empty right-hand input. In this case, we can
scan the left-hand input and forego the SetOp node completely.

With EXCEPT (without ALL), deduplication is still required, however
that can be done with an Aggregate node on the left input rather than
using the slightly less efficient SetOp node.

If I create two tables with a single int column containing 1 million
rows each, ANALYZE them and run some queries with and without the
patch, I see:

(work_mem=256MB, pgbench -M simple -T 10)

master:
EXCEPT ALL left dummy : tps = 8466.587802
EXCEPT ALL right dummy : tps = 3.160083
EXCEPT left dummy : tps = 8433.607519
EXCEPT right dummy : tps = 3.178104
INTERSECT (all types) : tps = 8392.695606
UNION left dummy : tps = 3.406355

patched:
EXCEPT ALL left dummy : tps = 8973.958896 (+5.99%)
EXCEPT ALL right dummy : tps = 53.583312 (+1595.63%)
EXCEPT left dummy : tps = 8736.716176 (+3.59%)
EXCEPT right dummy : tps = 3.385520 (+6.53%)
INTERSECT (all types) : tps = 8759.123942 (+4.37%)
UNION left dummy : tps = 3.590264 (+5.40%)

You can see EXCEPT ALL with the empty right-hand input became ~15x
faster, and all the others became ~5% faster.

There are some additional benefits aside from the performance as it's
possible to provide better row estimates in certain cases.  For
example, if a UNION query removes all apart from 1 input, we can do
estimate_num_groups() on that input.  Otherwise, we're left to the
assumption that all rows are unique, which certainly could cause some
trouble later in planning for queries consuming the results of set
operations in subqueries. EXCEPT with an empty right-hand input also
benefits from improved row estimates for the same reason.

For me, this seems worth doing. Set operations have been drawn out of
the dark ages with the last few releases, and I feel this makes them
more aligned to the set of optimisations we've come to expect in other
parts of the planner.

I'm happy to hear other opinions.

Patch attached.

David

[1] https://postgr.es/m/18904-c5fea7892f4d26ed@postgresql.org

Вложения

В списке pgsql-hackers по дате отправления: