Обсуждение: Check each of base restriction clauses for constant-FALSE-or-NULL

Поиск
Список
Период
Сортировка

Check each of base restriction clauses for constant-FALSE-or-NULL

От
Richard Guo
Дата:
In relation_excluded_by_constraints() when we're trying to figure out
whether the relation need not be scanned, one of the checks we do is to
detect constant-FALSE-or-NULL restriction clauses.  Currently we perform
this check only when there is exactly one baserestrictinfo entry, and
the comment explains this as below.

 * Regardless of the setting of constraint_exclusion, detect
 * constant-FALSE-or-NULL restriction clauses.  Because const-folding will
 * reduce "anything AND FALSE" to just "FALSE", any such case should
 * result in exactly one baserestrictinfo entry.

This doesn't seem entirely correct, because equivclass.c may generate
constant-FALSE baserestrictinfo entry on the fly.  In addition, other
quals could get pushed down to the baserel.  All these cases would
result in that the baserestrictinfo list might possibly have other
members besides the FALSE constant.

So I'm wondering if we should check each of base restriction clauses for
constant-FALSE-or-NULL quals, like attached.

Here are some examples.

-- #1 constant-FALSE generated by ECs

-- unpatched (in all branches)
explain (costs off) select * from t t1 where a = 1 and a = 2;
        QUERY PLAN
--------------------------
 Result
   One-Time Filter: false
   ->  Seq Scan on t t1
         Filter: (a = 1)
(4 rows)

-- patched
explain (costs off) select * from t t1 where a = 1 and a = 2;
        QUERY PLAN
--------------------------
 Result
   One-Time Filter: false
(2 rows)


-- #2 other quals get pushed down to the baserel

-- unpatched (in 15 and earlier)
explain (costs off)
select * from t t1 left join (select * from t t2 where false) s on s.a = 1;
              QUERY PLAN
--------------------------------------
 Nested Loop Left Join
   ->  Seq Scan on t t1
   ->  Materialize
         ->  Result
               One-Time Filter: false
               ->  Seq Scan on t t2
                     Filter: (a = 1)
(7 rows)

-- patched
explain (costs off)
select * from t t1 left join (select * from t t2 where false) s on s.a = 1;
           QUERY PLAN
--------------------------------
 Nested Loop Left Join
   ->  Seq Scan on t t1
   ->  Result
         One-Time Filter: false
(4 rows)

I'm a little concerned that it will bring some overhead to loop through
the baserestrictinfo list.  But considering that other codes in the same
function also loops through the list, maybe I'm worrying over nothing.

Any thoughts?

Thanks
Richard
Вложения

Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
Ashutosh Bapat
Дата:
On Sat, Oct 7, 2023 at 3:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
> In relation_excluded_by_constraints() when we're trying to figure out
> whether the relation need not be scanned, one of the checks we do is to
> detect constant-FALSE-or-NULL restriction clauses.  Currently we perform
> this check only when there is exactly one baserestrictinfo entry, and
> the comment explains this as below.
>
>  * Regardless of the setting of constraint_exclusion, detect
>  * constant-FALSE-or-NULL restriction clauses.  Because const-folding will
>  * reduce "anything AND FALSE" to just "FALSE", any such case should
>  * result in exactly one baserestrictinfo entry.
>
> This doesn't seem entirely correct, because equivclass.c may generate
> constant-FALSE baserestrictinfo entry on the fly.  In addition, other
> quals could get pushed down to the baserel.  All these cases would
> result in that the baserestrictinfo list might possibly have other
> members besides the FALSE constant.
>
> So I'm wondering if we should check each of base restriction clauses for
> constant-FALSE-or-NULL quals, like attached.
>
> Here are some examples.
>
> -- #1 constant-FALSE generated by ECs
>
> -- unpatched (in all branches)
>
>         QUERY PLAN
> --------------------------
>  Result
>    One-Time Filter: false
>    ->  Seq Scan on t t1
>          Filter: (a = 1)
> (4 rows)
>

I used a slightly modified query as below

# explain (costs off) select * from pg_class t1 where oid = 1 and oid = 2;
                        QUERY PLAN
----------------------------------------------------------
 Result
   One-Time Filter: false
   ->  Index Scan using pg_class_oid_index on pg_class t1
         Index Cond: (oid = '1'::oid)
(4 rows)

postgres@312571=# explain (analyze, costs off) select * from pg_class
t1 where oid = 1 and oid = 2;
                                QUERY PLAN
---------------------------------------------------------------------------
 Result (actual time=0.002..0.003 rows=0 loops=1)
   One-Time Filter: false
   ->  Index Scan using pg_class_oid_index on pg_class t1 (never executed)
         Index Cond: (oid = '1'::oid)
 Planning Time: 0.176 ms
 Execution Time: 0.052 ms
(6 rows)

You will see that the scan node was never executed. Hence there's no
execution time benefit if we remove the scan plan.

Where do we produce the single baserestrictinfo mentioned in the
comments? Is it before the planning proper starts?

get_gating_quals does what you are doing much earlier in the query
processing. Your code would just duplicate that.

>
> -- patched
> explain (costs off)
> select * from t t1 left join (select * from t t2 where false) s on s.a = 1;
>            QUERY PLAN
> --------------------------------
>  Nested Loop Left Join
>    ->  Seq Scan on t t1
>    ->  Result
>          One-Time Filter: false
> (4 rows)

Does your code have any other benefits like deeming an inner join as empty?

--
Best Wishes,
Ashutosh Bapat



Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
Richard Guo
Дата:

On Mon, Oct 9, 2023 at 5:48 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
postgres@312571=# explain (analyze, costs off) select * from pg_class
t1 where oid = 1 and oid = 2;
                                QUERY PLAN
---------------------------------------------------------------------------
 Result (actual time=0.002..0.003 rows=0 loops=1)
   One-Time Filter: false
   ->  Index Scan using pg_class_oid_index on pg_class t1 (never executed)
         Index Cond: (oid = '1'::oid)
 Planning Time: 0.176 ms
 Execution Time: 0.052 ms
(6 rows)

You will see that the scan node was never executed. Hence there's no
execution time benefit if we remove the scan plan.

Yeah, the constant-FALSE is a pseudoconstant qual and will result in a
gating Result node atop the scan node.  So this optimization about
detecting constant-FALSE restriction clauses and marking the rel as
dummy if there is one is not likely to benefit execution time.  AFAICS
it can help save some planning efforts, because once a base rel is
marked dummy, we won't bother building access paths for it.  Also a
dummy input rel can save efforts when we generate paths for joinrel, see
how we cope with dummy rels in populate_joinrel_with_paths().
 
Where do we produce the single baserestrictinfo mentioned in the
comments? Is it before the planning proper starts?

Do you mean the const-folding?  It happens in the preprocessing phase,
specifically in eval_const_expressions().
 
get_gating_quals does what you are doing much earlier in the query
processing. Your code would just duplicate that.

Hm, I don't think so.  get_gating_quals is called in createplan.c, where
we've selected the best path, while the optimization with my code
happens much earlier, when we set size estimates for a base relation.
Neither of these two is a duplicate of the other.  I think the theory
here is that it's always a win to mark a rel as dummy if possible as
early as we can.

Thanks
Richard

Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
Ashutosh Bapat
Дата:
On Tue, Oct 10, 2023 at 11:09 AM Richard Guo <guofenglinux@gmail.com> wrote:
> Do you mean the const-folding?  It happens in the preprocessing phase,
> specifically in eval_const_expressions().

Thanks.

> Hm, I don't think so.  get_gating_quals is called in createplan.c, where
> we've selected the best path, while the optimization with my code
> happens much earlier, when we set size estimates for a base relation.
> Neither of these two is a duplicate of the other.  I think the theory
> here is that it's always a win to mark a rel as dummy if possible as
> early as we can.

Right. Do you have an example where this could be demonstrated?

--
Best Wishes,
Ashutosh Bapat



Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
Richard Guo
Дата:

On Tue, Oct 10, 2023 at 1:45 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Tue, Oct 10, 2023 at 11:09 AM Richard Guo <guofenglinux@gmail.com> wrote:
> Hm, I don't think so.  get_gating_quals is called in createplan.c, where
> we've selected the best path, while the optimization with my code
> happens much earlier, when we set size estimates for a base relation.
> Neither of these two is a duplicate of the other.  I think the theory
> here is that it's always a win to mark a rel as dummy if possible as
> early as we can.

Right. Do you have an example where this could be demonstrated?

Hmm, do you think the two examples in the initial email of this thread
can serve the purpose, by observing how we avoid building access paths
for the dummy rel with this optimization?

Thanks
Richard

Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
David Rowley
Дата:
On Sat, 7 Oct 2023 at 22:44, Richard Guo <guofenglinux@gmail.com> wrote:
>
> In relation_excluded_by_constraints() when we're trying to figure out
> whether the relation need not be scanned, one of the checks we do is to
> detect constant-FALSE-or-NULL restriction clauses.  Currently we perform
> this check only when there is exactly one baserestrictinfo entry, and
> the comment explains this as below.
>
>  * Regardless of the setting of constraint_exclusion, detect
>  * constant-FALSE-or-NULL restriction clauses.  Because const-folding will
>  * reduce "anything AND FALSE" to just "FALSE", any such case should
>  * result in exactly one baserestrictinfo entry.

Coincidentally (?), I saw the same thing just a few weeks ago while
working on [1].  I made the exact same adjustment to the code in
relation_excluded_by_constraints() as you have.

I wasn't really expecting the baserestrictinfo list to be excessively
long, and if it ever was, I think looking at things like selectivity
estimations would by far drown out looping over the entire list in
relation_excluded_by_constraints() rather than just looking at the
first item in the list.

After making the change, I saw the same regression test change as you
did, but didn't really feel like it was worth tackling separately from
the patch that we were working on.

David

[1] https://postgr.es/m/CAApHDvpkfS1hY3P4DWbOw6WCgRrja=yDLoEz+5g+E2z19Upsrg@mail.gmail.com



Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
Richard Guo
Дата:

On Tue, Oct 10, 2023 at 5:10 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Sat, 7 Oct 2023 at 22:44, Richard Guo <guofenglinux@gmail.com> wrote:
>
> In relation_excluded_by_constraints() when we're trying to figure out
> whether the relation need not be scanned, one of the checks we do is to
> detect constant-FALSE-or-NULL restriction clauses.  Currently we perform
> this check only when there is exactly one baserestrictinfo entry, and
> the comment explains this as below.
>
>  * Regardless of the setting of constraint_exclusion, detect
>  * constant-FALSE-or-NULL restriction clauses.  Because const-folding will
>  * reduce "anything AND FALSE" to just "FALSE", any such case should
>  * result in exactly one baserestrictinfo entry.

Coincidentally (?), I saw the same thing just a few weeks ago while
working on [1].  I made the exact same adjustment to the code in
relation_excluded_by_constraints() as you have.

Haha, I noticed the need of this change while writing v5 patch [1] for
that same thread.  That patch generates a new constant-FALSE
RestrictInfo for an IS NULL qual that can be reduced to FALSE, and this
makes the comment in relation_excluded_by_constraints() about 'any such
case should result in exactly one baserestrictinfo entry' not true any
more.  Without this change in relation_excluded_by_constraints(), a
query like below would not be able to be marked as dummy.

    select * from t where a is null and 'otherquals';

And then the regression test diff after applying this change reminds me
that equivclass.c may also generate new constant-FALSE RestrictInfos on
the fly, so it seems to me that this change may benefit some queries
even without the 'reduce-NullTest' patch.
 
I wasn't really expecting the baserestrictinfo list to be excessively
long, and if it ever was, I think looking at things like selectivity
estimations would by far drown out looping over the entire list in
relation_excluded_by_constraints() rather than just looking at the
first item in the list.

Agreed.
 
After making the change, I saw the same regression test change as you
did, but didn't really feel like it was worth tackling separately from
the patch that we were working on.

I was thinking that this change may be worthwhile by itself even without
the 'reduce-NullTest' patch, because it can benefit some cases, such as
where EC generates constant-FALSE on the fly.  So maybe it's worth a
separate patch?  I'm not quite sure.

[1] https://www.postgresql.org/message-id/CAMbWs4-eNVNTNc94eF%2BO_UwHYKv43vyMurhcdqMV%3DHt5fehcOg%40mail.gmail.com

Thanks
Richard

Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> On Tue, Oct 10, 2023 at 5:10 PM David Rowley <dgrowleyml@gmail.com> wrote:
>> After making the change, I saw the same regression test change as you
>> did, but didn't really feel like it was worth tackling separately from
>> the patch that we were working on.

> I was thinking that this change may be worthwhile by itself even without
> the 'reduce-NullTest' patch, because it can benefit some cases, such as
> where EC generates constant-FALSE on the fly.  So maybe it's worth a
> separate patch?  I'm not quite sure.

I think it's worth pushing separately, since it has a positive impact
on existing cases, as shown by the regression test plan change.
Also, if you compare that test case to the one immediately following
it, it's downright weird that we are presently smarter about
optimizing the more complicated case.  (I've not dug into exactly
why that is; maybe worth running it to ground?)

            regards, tom lane



Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
Tom Lane
Дата:
I wrote:
> Also, if you compare that test case to the one immediately following
> it, it's downright weird that we are presently smarter about
> optimizing the more complicated case.  (I've not dug into exactly
> why that is; maybe worth running it to ground?)

The reason seems to be that joinrels.c's restriction_is_constant_false
knows that it has to check all members of the restrictinfo list, not
just one; and we get to that because some of the originally generated
EC clauses are join clauses in the second case.

So this logic in relation_excluded_by_constraints is just wrong ---
premature optimization on my part, looks like.

            regards, tom lane



Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
Tom Lane
Дата:
I wrote:
> So this logic in relation_excluded_by_constraints is just wrong ---
> premature optimization on my part, looks like.

Pushed after a bit of fiddling with the comment.

            regards, tom lane



Re: Check each of base restriction clauses for constant-FALSE-or-NULL

От
Richard Guo
Дата:

On Thu, Oct 12, 2023 at 1:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pushed after a bit of fiddling with the comment.

Thanks for pushing!

Thanks
Richard