Обсуждение: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

Поиск
Список
Период
Сортировка

Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

От
David Rowley
Дата:
In [1] there was a report that set operations didn't correctly detect
when inputs were provably empty sets.  While this is not the bug that
the report claimed it to be, as it's just a missing optimisation, I
did decide to look at it to check if there was much performance to
gain from doing this.

The short of it is, Yes, there are cases when this can help query
performance. Primarily, this seems to come from when the code detects
that an EXCEPT ALL has an empty right-hand input. In this case, we can
scan the left-hand input and forego the SetOp node completely.

With EXCEPT (without ALL), deduplication is still required, however
that can be done with an Aggregate node on the left input rather than
using the slightly less efficient SetOp node.

If I create two tables with a single int column containing 1 million
rows each, ANALYZE them and run some queries with and without the
patch, I see:

(work_mem=256MB, pgbench -M simple -T 10)

master:
EXCEPT ALL left dummy : tps = 8466.587802
EXCEPT ALL right dummy : tps = 3.160083
EXCEPT left dummy : tps = 8433.607519
EXCEPT right dummy : tps = 3.178104
INTERSECT (all types) : tps = 8392.695606
UNION left dummy : tps = 3.406355

patched:
EXCEPT ALL left dummy : tps = 8973.958896 (+5.99%)
EXCEPT ALL right dummy : tps = 53.583312 (+1595.63%)
EXCEPT left dummy : tps = 8736.716176 (+3.59%)
EXCEPT right dummy : tps = 3.385520 (+6.53%)
INTERSECT (all types) : tps = 8759.123942 (+4.37%)
UNION left dummy : tps = 3.590264 (+5.40%)

You can see EXCEPT ALL with the empty right-hand input became ~15x
faster, and all the others became ~5% faster.

There are some additional benefits aside from the performance as it's
possible to provide better row estimates in certain cases.  For
example, if a UNION query removes all apart from 1 input, we can do
estimate_num_groups() on that input.  Otherwise, we're left to the
assumption that all rows are unique, which certainly could cause some
trouble later in planning for queries consuming the results of set
operations in subqueries. EXCEPT with an empty right-hand input also
benefits from improved row estimates for the same reason.

For me, this seems worth doing. Set operations have been drawn out of
the dark ages with the last few releases, and I feel this makes them
more aligned to the set of optimisations we've come to expect in other
parts of the planner.

I'm happy to hear other opinions.

Patch attached.

David

[1] https://postgr.es/m/18904-c5fea7892f4d26ed@postgresql.org

Вложения

Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> In [1] there was a report that set operations didn't correctly detect
> when inputs were provably empty sets.  While this is not the bug that
> the report claimed it to be, as it's just a missing optimisation, I
> did decide to look at it to check if there was much performance to
> gain from doing this.

I'm kind of resistant to the amount of code this patch adds in
comparison to the likely benefit.  Sure, a badly written query can
profit, but is it worth debugging and maintaining a couple hundred
lines of code for that?

The first few hunks of changes seem fine by this light, but I think
you're expending too much effort on the EXCEPT-with-dummy-inputs
cases.  I'm wondering if it could be shortened a great deal by
handling left-input-dummy and EXCEPT-ALL-with-right-input-dummy
but leaving the EXCEPT-with-right-input-dummy case unimproved.

            regards, tom lane



Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

От
David Rowley
Дата:
On Thu, 2 Oct 2025 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm wondering if it could be shortened a great deal by
> handling left-input-dummy and EXCEPT-ALL-with-right-input-dummy
> but leaving the EXCEPT-with-right-input-dummy case unimproved.

Good idea. Less code and still get to keep the one that did well in
the benchmark. See attached.

I ended up splitting the patch in two. 0001 for UNION only, then 0002
for the INTERSECT and EXCEPT.

David

Вложения

Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

От
Mingli Zhang
Дата:
Hi, 

David Rowley <dgrowleyml@gmail.com>于2025年10月2日 周四20:09写道:
On Thu, 2 Oct 2025 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm wondering if it could be shortened a great deal by
> handling left-input-dummy and EXCEPT-ALL-with-right-input-dummy
> but leaving the EXCEPT-with-right-input-dummy case unimproved.

Good idea. Less code and still get to keep the one that did well in
the benchmark. See attached.

I ended up splitting the patch in two. 0001 for UNION only, then 0002
for the INTERSECT and EXCEPT.

David

It seems that the optimization for `UNION ALL` is already implemented in the patch: it removes empty sub-paths and preserves the remaining ones. 
Should we add a test case to formally validate this behavior like Union cases?


Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> Good idea. Less code and still get to keep the one that did well in
> the benchmark. See attached.

0001's change in is_dummy_rel() seems like a complete hack, especially
since you didn't bother to change the adjacent comment.  Why is that
necessary?

v2 LGTM otherwise.

            regards, tom lane



Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

От
David Rowley
Дата:
On Fri, 3 Oct 2025 at 04:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 0001's change in is_dummy_rel() seems like a complete hack, especially
> since you didn't bother to change the adjacent comment.  Why is that
> necessary?

build_setop_child_paths() wraps the child inputs in SubqueryScanPaths,
so we need to see through those.

An alternative way would be to propagate those during build_setop_child_paths()

--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -523,6 +523,13 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
  bool is_sorted;
  int presorted_keys;

+     /* If the input rel is dummy, propagate that to this query level */
+     if (is_dummy_rel(final_rel))
+     {
+         mark_dummy_rel(rel);
+         continue;
+     }
+

As attached.

> v2 LGTM otherwise.

Thanks

David

Вложения

Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 3 Oct 2025 at 04:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 0001's change in is_dummy_rel() seems like a complete hack, especially
>> since you didn't bother to change the adjacent comment.  Why is that
>> necessary?

> build_setop_child_paths() wraps the child inputs in SubqueryScanPaths,
> so we need to see through those.

Ah.

> An alternative way would be to propagate those during build_setop_child_paths()

That answer works for me.  I was expecting you to just document the
need for the extra check in is_dummy_rel ;-) ... but this way is
perhaps better.

            regards, tom lane



Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs

От
David Rowley
Дата:
On Fri, 3 Oct 2025 at 02:55, Mingli Zhang <zmlpostgres@gmail.com> wrote:
> It seems that the optimization for `UNION ALL` is already implemented in the patch: it removes empty sub-paths and
preservesthe remaining ones.
 
> Should we add a test case to formally validate this behavior like Union cases?

If I were to do that, I'd have to come up with something that's
flatten_simple_union_all() proof. Maybe something like varying types
in the targetlist. I think it's probably not really worthwhile since
it's not testing any new code that is not already being tested by the
tests that I did add.

David