Обсуждение: Re: Self contradictory examining on rel's baserestrictinfo

Поиск
Список
Период
Сортировка

Re: Self contradictory examining on rel's baserestrictinfo

От
Robert Haas
Дата:
On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote:
> 1. Background
>     A few months ago, when i read source codes of B-tree in routine
>     _bt_preprocess_keys, i found that there are more contradictory
>     checking case we can add. I sent email to pgsql-hackers and
>     then community contributor replied me and told me someone had
>     already proposed this question. Thanks for taking the time
>     to address my question. After serveral conversations, i found
>     that we can do something more. We can place these jobs at planning time.

When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".

I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:

robert.haas=# explain select * from foo where a < 1 and a > 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a < 1 and a = 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a <> 1 and a = 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.

>     7) Scalar array comparison expression
>     First we need to deconstruct the const array, figure out the null and non-null
>     elements.
>     If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
>     ALL(array[56, null])), it's contradictory.

True, but that already seems to be working:

robert.haas=# explain select * from foo where a <= all(array[56, null]);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Self contradictory examining on rel's baserestrictinfo

От
Peter Geoghegan
Дата:
On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
> There are cases where we don't already draw the necessary conclusions,
> such as a>1 and a>2, which could be simplified to a>2. But those cases
> aren't necessarily that common.

Actually, we do use the more restrictive operator with cases like "a>1
and a>2" -- but only in contexts that happen to involve a B-Tree index
scan, where _bt_preprocess_keys gets called. So it's a bit hit or
miss.

Tom has in the past expressed an interesting in moving the stuff in
_bt_preprocess_keys into the planner:

https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us

Note that almost nothing in _bt_preprocess_keys is particularly
related to nbtree itself, except perhaps for the stuff that marks scan
keys required. The fact that "WHERE a IN (1, 2, 3) AND a < 2" can
actually be simplified to "WHERE a = 1" has exactly nothing to do with
B-Tree index scans, but we'll only do this simplification in the
context of a B-Tree index scan. There is also some redundancy between
_bt_preprocess_keys and the planner, which is less than ideal.

--
Peter Geoghegan



Re: Self contradictory examining on rel's baserestrictinfo

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> There are cases where we don't already draw the necessary conclusions,
>> such as a>1 and a>2, which could be simplified to a>2. But those cases
>> aren't necessarily that common.

> Actually, we do use the more restrictive operator with cases like "a>1
> and a>2" -- but only in contexts that happen to involve a B-Tree index
> scan, where _bt_preprocess_keys gets called. So it's a bit hit or
> miss.

Worth noting also is that _bt_preprocess_keys can detect cases that
the planner cannot because the relevant values are not constants.

I'm a little skeptical that we should expend a lot more effort on
the sorts of cases discussed here.  Basically, this sort of patch
improves matters for people who write crummy queries while penalizing
everybody else.  We need to be very careful about adding cycles to
planner runtime in pursuit of optimizations that are only rarely
successful.

I'm not trying to say that we should reject all of this out-of-hand,
but I want to see some analysis of how much each check costs and
how likely it is to win in real-world queries.

BTW, it's also worth pointing out the constraint_exclusion GUC.
If that's turned up to "on" from its default setting of "partition",
it authorizes the planner to spend more effort looking for
contradictory quals.  It might be reasonable to gate some of these
checks with that.  (I also can't help wondering if any of them are
already implemented but are hidden behind that.  I think Robert
must have had constraint_exclusion = on for his examples a
couple messages back.)

            regards, tom lane



Re: Self contradictory examining on rel's baserestrictinfo

От
Peter Geoghegan
Дата:
On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm a little skeptical that we should expend a lot more effort on
> the sorts of cases discussed here.  Basically, this sort of patch
> improves matters for people who write crummy queries while penalizing
> everybody else.

I think that it's more complicated than that. Rather than explain what
I mean in general terms, I'll give you a specific example of the kind
of thing that seems quite interesting to me:

It would be fairly easy to teach nbtree about a new kind of
ScalarArrayOp that worked just like a conventional SAOP, but also
returned tuples matching "IS NULL" (IS NULL uses the equals strategy
internally already, so it'd be almost the same as treating NULL as
just another array element). This could have significant advantages
over what's even possible right now, particularly in cases involving
ORDER BY ... LIMIT.

I suppose that we'd have to invent some kind of new syntax for this.
But wouldn't it also make sense if it worked with "WHERE a IN (1, 2)
OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"? Theoretically
this would still be a case that amounted to improving matters for
badly written queries...but not really (we can hardly expect many
users to adopt our esoteric non-standard syntax).

In fact, you could make a similar argument for rewriting IN() into a
"= ANY()" SOAP (in the way that we always have).

> We need to be very careful about adding cycles to
> planner runtime in pursuit of optimizations that are only rarely
> successful.

I agree.

--
Peter Geoghegan



Re: Self contradictory examining on rel's baserestrictinfo

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> It would be fairly easy to teach nbtree about a new kind of
> ScalarArrayOp that worked just like a conventional SAOP, but also
> returned tuples matching "IS NULL" (IS NULL uses the equals strategy
> internally already, so it'd be almost the same as treating NULL as
> just another array element). This could have significant advantages
> over what's even possible right now, particularly in cases involving
> ORDER BY ... LIMIT.

> I suppose that we'd have to invent some kind of new syntax for this.
> But wouldn't it also make sense if it worked with "WHERE a IN (1, 2)
> OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"?

I'd be a strong -1 for inventing new syntax for that.  However, if
that's actually a common query pattern (I'm not convinced of it)
we could certainly build something to recognize that usage and
transform it into a suitable executable structure.

However, I don't see the connection between that and the current
thread.  That pattern is not self-contradictory.  My doubts
about its usefulness have more to do with it being evidence of
the SQL anti-pattern of treating NULL as a normal data value.
But once you've made that choice in your data representation,
you're going to end up with queries like this, and there's
nothing you could do to write them "better".

            regards, tom lane



Re: Self contradictory examining on rel's baserestrictinfo

От
Peter Geoghegan
Дата:
On Mon, Nov 25, 2024 at 6:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > I suppose that we'd have to invent some kind of new syntax for this.
> > But wouldn't it also make sense if it worked with "WHERE a IN (1, 2)
> > OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"?
>
> I'd be a strong -1 for inventing new syntax for that.  However, if
> that's actually a common query pattern (I'm not convinced of it)
> we could certainly build something to recognize that usage and
> transform it into a suitable executable structure.

My basic point is this: SQL is supposed to be declarative -- in theory
it isn't supposed to matter how you write the query.

I don't see any hard distinction between the sorts of transformations
that the OP is talking about, and what you're describing here. There's
quite a lot of gray area, at least.

> However, I don't see the connection between that and the current
> thread.  That pattern is not self-contradictory.  My doubts
> about its usefulness have more to do with it being evidence of
> the SQL anti-pattern of treating NULL as a normal data value.

It's just an example, chosen to be easy to explain. I guess you didn't
like that example, so I'll go with another:

What about the "general OR optimization" stuff described by the MDAM
paper? The redundant and contradictory qual stuff is needed there, to
make sure that the final index scan access path (consisting of a
series of disjunctive index scans in key space order) will reliably
avoid returning duplicate rows.

(FWIW I think that the OP took some cues from the MDAM paper, since
they talked about doing things like transforming SQL Standard row
value constructor expressions into disjuncts.)

I don't have a concrete proposal here, or anything like it. I just
want to express that I believe that there's a lack of joined-up
thinking about nbtree preprocessing and the optimizer (not for the
first time). We *are* missing a few interesting tricks here.

--
Peter Geoghegan



Re: Self contradictory examining on rel's baserestrictinfo

От
Robert Haas
Дата:
On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> BTW, it's also worth pointing out the constraint_exclusion GUC.
> If that's turned up to "on" from its default setting of "partition",
> it authorizes the planner to spend more effort looking for
> contradictory quals.  It might be reasonable to gate some of these
> checks with that.  (I also can't help wondering if any of them are
> already implemented but are hidden behind that.  I think Robert
> must have had constraint_exclusion = on for his examples a
> couple messages back.)

I definitely didn't change the value of that setting. It's possible I
was running on an older branch, but I don't think it was anything
ancient.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Self contradictory examining on rel's baserestrictinfo

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ...  I think Robert
>> must have had constraint_exclusion = on for his examples a
>> couple messages back.)

> I definitely didn't change the value of that setting. It's possible I
> was running on an older branch, but I don't think it was anything
> ancient.

Hmm, constraint_exclusion has defaulted to "partition" for many
years now.  But when I tried to reproduce your examples at [1]:

regression=# create table foo (a int);
CREATE TABLE
regression=# explain select * from foo where a < 1 and a > 1;
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on foo  (cost=0.00..48.25 rows=13 width=4)
   Filter: ((a < 1) AND (a > 1))
(2 rows)

regression=# show constraint_exclusion;
 constraint_exclusion
----------------------
 partition
(1 row)

regression=# set constraint_exclusion = on;
SET
regression=# explain select * from foo where a < 1 and a > 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

and similarly for some of the other examples (I didn't try
every one).  Maybe your test table was partitioned??

            regards, tom lane

[1] https://www.postgresql.org/message-id/CA%2BTgmoZqiCwHbZczXXLjucfuHi%3D7EahSyzEj5yrqYKMQ0QOL9Q%40mail.gmail.com



Re: Self contradictory examining on rel's baserestrictinfo

От
Robert Haas
Дата:
On Tue, Nov 26, 2024 at 12:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maybe your test table was partitioned??

Ah, yes, it was.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Self contradictory examining on rel's baserestrictinfo

От
ro b
Дата:
Thank you for your advice and guidance.
I didn't know the constraint_exclusion switch existed.
As your advice i need to make what i am doing to be clear.

> There are cases where we don't already draw the necessary conclusions,
> such as a>1 and a>2, which could be simplified to a>2. But those cases
> aren't necessarily that common.

The path i committed not just test contradictory but also do the 
simplification.  The simplification is limited in the BTREE.
Could you interpret the case in a little more detail.

Best regards


From: Robert Haas <robertmhaas@gmail.com>
Sent: Tuesday, November 26, 2024 04:55
To: ro b <bigbro_wq@hotmail.com>
Cc: pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
 
On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote:
> 1. Background
>     A few months ago, when i read source codes of B-tree in routine
>     _bt_preprocess_keys, i found that there are more contradictory
>     checking case we can add. I sent email to pgsql-hackers and
>     then community contributor replied me and told me someone had
>     already proposed this question. Thanks for taking the time
>     to address my question. After serveral conversations, i found
>     that we can do something more. We can place these jobs at planning time.

When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".

I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:

robert.haas=# explain select * from foo where a < 1 and a > 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a < 1 and a = 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a <> 1 and a = 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.

>     7) Scalar array comparison expression
>     First we need to deconstruct the const array, figure out the null and non-null
>     elements.
>     If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
>     ALL(array[56, null])), it's contradictory.

True, but that already seems to be working:

robert.haas=# explain select * from foo where a <= all(array[56, null]);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Self contradictory examining on rel's baserestrictinfo

От
Robert Haas
Дата:
On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote:
> The path i committed not just test contradictory but also do the
> simplification.  The simplification is limited in the BTREE.
> Could you interpret the case in a little more detail.

I am not able to understand this paragraph.

Regretfully,

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Self contradictory examining on rel's baserestrictinfo

От
ro b
Дата:
I mean why can't draw the conclusions 
like the case a>1 and a>2 which can be simplified 
to a>2 if the operator > is btree opclass. 


Best regards


From: Robert Haas <robertmhaas@gmail.com>
Sent: Wednesday, November 27, 2024 22:52
To: ro b <bigbro_wq@hotmail.com>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
 
On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote:
> The path i committed not just test contradictory but also do the
> simplification.  The simplification is limited in the BTREE.
> Could you interpret the case in a little more detail.

I am not able to understand this paragraph.

Regretfully,

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Self contradictory examining on rel's baserestrictinfo

От
Robert Haas
Дата:
On Wed, Nov 27, 2024 at 10:28 AM ro b <bigbro_wq@hotmail.com> wrote:
> I mean why can't draw the conclusions
> like the case a>1 and a>2 which can be simplified
> to a>2 if the operator > is btree opclass.

This question has already been discussed in some detail on this email
thread. First, we sometimes do detect it, as discussed by Peter.
Second, it is an unusual case and so not worth spending a lot of CPU
cycles to detect, as discussed by Tom.

We might still accept a patch to improve things in this area. For
example, if somebody could find a way to change the code so that we
are able to detect more cases without needing to spend more CPU
cycles, I think everyone would like that. However, that may be a
tricky goal to accomplish.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Self contradictory examining on rel's baserestrictinfo

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Nov 27, 2024 at 10:28 AM ro b <bigbro_wq@hotmail.com> wrote:
>> I mean why can't draw the conclusions
>> like the case a>1 and a>2 which can be simplified
>> to a>2 if the operator > is btree opclass.

> This question has already been discussed in some detail on this email
> thread. First, we sometimes do detect it, as discussed by Peter.
> Second, it is an unusual case and so not worth spending a lot of CPU
> cycles to detect, as discussed by Tom.

> We might still accept a patch to improve things in this area. For
> example, if somebody could find a way to change the code so that we
> are able to detect more cases without needing to spend more CPU
> cycles, I think everyone would like that. However, that may be a
> tricky goal to accomplish.

This discussion trailed off without any final resolution about what
to do with the patch submission.  I think we owe the submitter at
least a minimal actual review, so ...

I took a quick look at the "self_contradictory.sql" regression test
included in the patch, with an eye to seeing how much of it we pass
already when constraint_exclusion is set to "on".  The answer is
"a fair amount", in particular we successfully detect many of the
cases where the clauses can be proved contradictory.  The ones
where we fail to do so are mostly cases like
 
 explain (costs off)
 SELECT * FROM self_a WHERE z IS UNKNOWN AND z IS NOT UNKNOWN;
-        QUERY PLAN        
---------------------------
- Result
-   One-Time Filter: false
+                    QUERY PLAN                     
+---------------------------------------------------
+ Seq Scan on self_a
+   Filter: ((z IS UNKNOWN) AND (z IS NOT UNKNOWN))
 (2 rows)

This is not a fundamental shortcoming in the system, it's just
that optimizer/util/predtest.c has not been built out very far
with respect to BooleanTest nodes.  We could certainly do that,
and it wouldn't add much runtime for queries that contain no
BooleanTest conditions, but I have to question whether it's
worth the trouble.  Is a query like the above really common?

There are also a bunch of cases that are not contradictory
but that the patch manages to simplify, for example
 
 explain (costs off)
 SELECT * FROM self_a WHERE a <= 10 AND a = 10;
-     QUERY PLAN     
---------------------
+             QUERY PLAN             
+------------------------------------
  Seq Scan on self_a
-   Filter: (a = 10)
+   Filter: ((a <= 10) AND (a = 10))
 (2 rows)

The constraint_exclusion framework can't make this sort of
transformation because it is only focused on proving a relation's
quals to be contradictory or not.  So this part is new work ...
but again I have to wonder if this query pattern is common enough
to be worth spending planner cycles to check for.

As Peter mentioned, we do actually have logic that performs the
equivalent transformation within btree indexqual setup.  However, the
context is pretty different: first, the work of identifying clauses
that bear on the same variable has already been done, and second we
really have to do at least some of this work on the way to identifying
the appropriate initial search condition for btree descent.  It's not
apparent to me that it's worth having a whole new planner pass that
is searching for clauses involving the same variable.  That is not
especially cheap if there are a lot of clauses.  If we were to pursue
this, I'd want to think about somehow integrating it with the work
that indxpath.c already does to collect index-relevant conditions.

So in short, my main review comment is that you should be looking
to integrate with existing planner functionality such as predtest.c
and indxpath.c, rather than writing a few thousand lines of
not-connected-to-anything-else code.  It's too hard to make the
case that these optimizations are worth the cost if they require
an entire separate analysis pass.

Some other comments:

* You need to work harder on the comments in whatever you write.
contradictory.c is drastically undercommented, and what there is
is frequently unhelpful, like the obviously copied-and-pasted
file header.  You also did things like adding entirely-undocumented
fields to common structures such as RestrictInfo.  This certainly
wouldn't be committable as-is; but it also makes the patch painful
to review in any detail, since the reviewer must reverse-engineer
a lot of information that should have been provided by comments.

* The side-effects on existing regression test cases are pretty
concerning: it's not clear that those tests still exercise what
they were meant to.  (You can be sure that a test case intentionally
written with redundant conditions was not written that way just to
provide a case you could optimize later.)  We'd have to analyze
all those test cases in detail and figure out whether the
simplification is OK or we need to provide some way to defeat it.


I'm going to mark this submission Returned With Feedback, because
I don't foresee anything very close to this getting committed.
But I'd encourage you to pursue the ideas in other forms as
suggested by this discussion.

            regards, tom lane



Re: Self contradictory examining on rel's baserestrictinfo

От
Tom Lane
Дата:
I wrote:
> This is not a fundamental shortcoming in the system, it's just
> that optimizer/util/predtest.c has not been built out very far
> with respect to BooleanTest nodes.

Oh ... I'd momentarily forgotten that there's already a patch
in the queue to attack that:

https://commitfest.postgresql.org/51/4690/

That's been stalled for lack of review, so if you were interested
in helping out, that'd be great.

            regards, tom lane