Обсуждение: Re: Self contradictory examining on rel's baserestrictinfo
On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote: > 1. Background > A few months ago, when i read source codes of B-tree in routine > _bt_preprocess_keys, i found that there are more contradictory > checking case we can add. I sent email to pgsql-hackers and > then community contributor replied me and told me someone had > already proposed this question. Thanks for taking the time > to address my question. After serveral conversations, i found > that we can do something more. We can place these jobs at planning time. When you're referring to things that happened in the past, you should provide links to specific messages and names of specific contributors. It will be difficult for anyone to find the previous discussion based on your description of "a few months ago" and a "community contributor". I'm a little confused because it seems like you think we don't do any of this kind of thing already. But: robert.haas=# explain select * from foo where a < 1 and a > 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) robert.haas=# explain select * from foo where a < 1 and a = 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) robert.haas=# explain select * from foo where a <> 1 and a = 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) robert.haas=# explain select * from foo where a in (1,2,3) and a is null; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) There are cases where we don't already draw the necessary conclusions, such as a>1 and a>2, which could be simplified to a>2. But those cases aren't necessarily that common. > 7) Scalar array comparison expression > First we need to deconstruct the const array, figure out the null and non-null > elements. > If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <= > ALL(array[56, null])), it's contradictory. True, but that already seems to be working: robert.haas=# explain select * from foo where a <= all(array[56, null]); QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) I'm not saying there is no room for improvement here, but I think you will need to (1) be more clear about exactly which cases we are already handling vs. which ones you want to handle, (2) possibly split the patch into smaller patches each of which handles one specific case instead of bundling many improvements together, and (3) improve the comments and commit message in the patch(es). -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote: > There are cases where we don't already draw the necessary conclusions, > such as a>1 and a>2, which could be simplified to a>2. But those cases > aren't necessarily that common. Actually, we do use the more restrictive operator with cases like "a>1 and a>2" -- but only in contexts that happen to involve a B-Tree index scan, where _bt_preprocess_keys gets called. So it's a bit hit or miss. Tom has in the past expressed an interesting in moving the stuff in _bt_preprocess_keys into the planner: https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us Note that almost nothing in _bt_preprocess_keys is particularly related to nbtree itself, except perhaps for the stuff that marks scan keys required. The fact that "WHERE a IN (1, 2, 3) AND a < 2" can actually be simplified to "WHERE a = 1" has exactly nothing to do with B-Tree index scans, but we'll only do this simplification in the context of a B-Tree index scan. There is also some redundancy between _bt_preprocess_keys and the planner, which is less than ideal. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote: >> There are cases where we don't already draw the necessary conclusions, >> such as a>1 and a>2, which could be simplified to a>2. But those cases >> aren't necessarily that common. > Actually, we do use the more restrictive operator with cases like "a>1 > and a>2" -- but only in contexts that happen to involve a B-Tree index > scan, where _bt_preprocess_keys gets called. So it's a bit hit or > miss. Worth noting also is that _bt_preprocess_keys can detect cases that the planner cannot because the relevant values are not constants. I'm a little skeptical that we should expend a lot more effort on the sorts of cases discussed here. Basically, this sort of patch improves matters for people who write crummy queries while penalizing everybody else. We need to be very careful about adding cycles to planner runtime in pursuit of optimizations that are only rarely successful. I'm not trying to say that we should reject all of this out-of-hand, but I want to see some analysis of how much each check costs and how likely it is to win in real-world queries. BTW, it's also worth pointing out the constraint_exclusion GUC. If that's turned up to "on" from its default setting of "partition", it authorizes the planner to spend more effort looking for contradictory quals. It might be reasonable to gate some of these checks with that. (I also can't help wondering if any of them are already implemented but are hidden behind that. I think Robert must have had constraint_exclusion = on for his examples a couple messages back.) regards, tom lane
On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm a little skeptical that we should expend a lot more effort on > the sorts of cases discussed here. Basically, this sort of patch > improves matters for people who write crummy queries while penalizing > everybody else. I think that it's more complicated than that. Rather than explain what I mean in general terms, I'll give you a specific example of the kind of thing that seems quite interesting to me: It would be fairly easy to teach nbtree about a new kind of ScalarArrayOp that worked just like a conventional SAOP, but also returned tuples matching "IS NULL" (IS NULL uses the equals strategy internally already, so it'd be almost the same as treating NULL as just another array element). This could have significant advantages over what's even possible right now, particularly in cases involving ORDER BY ... LIMIT. I suppose that we'd have to invent some kind of new syntax for this. But wouldn't it also make sense if it worked with "WHERE a IN (1, 2) OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"? Theoretically this would still be a case that amounted to improving matters for badly written queries...but not really (we can hardly expect many users to adopt our esoteric non-standard syntax). In fact, you could make a similar argument for rewriting IN() into a "= ANY()" SOAP (in the way that we always have). > We need to be very careful about adding cycles to > planner runtime in pursuit of optimizations that are only rarely > successful. I agree. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > It would be fairly easy to teach nbtree about a new kind of > ScalarArrayOp that worked just like a conventional SAOP, but also > returned tuples matching "IS NULL" (IS NULL uses the equals strategy > internally already, so it'd be almost the same as treating NULL as > just another array element). This could have significant advantages > over what's even possible right now, particularly in cases involving > ORDER BY ... LIMIT. > I suppose that we'd have to invent some kind of new syntax for this. > But wouldn't it also make sense if it worked with "WHERE a IN (1, 2) > OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"? I'd be a strong -1 for inventing new syntax for that. However, if that's actually a common query pattern (I'm not convinced of it) we could certainly build something to recognize that usage and transform it into a suitable executable structure. However, I don't see the connection between that and the current thread. That pattern is not self-contradictory. My doubts about its usefulness have more to do with it being evidence of the SQL anti-pattern of treating NULL as a normal data value. But once you've made that choice in your data representation, you're going to end up with queries like this, and there's nothing you could do to write them "better". regards, tom lane
On Mon, Nov 25, 2024 at 6:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > I suppose that we'd have to invent some kind of new syntax for this. > > But wouldn't it also make sense if it worked with "WHERE a IN (1, 2) > > OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"? > > I'd be a strong -1 for inventing new syntax for that. However, if > that's actually a common query pattern (I'm not convinced of it) > we could certainly build something to recognize that usage and > transform it into a suitable executable structure. My basic point is this: SQL is supposed to be declarative -- in theory it isn't supposed to matter how you write the query. I don't see any hard distinction between the sorts of transformations that the OP is talking about, and what you're describing here. There's quite a lot of gray area, at least. > However, I don't see the connection between that and the current > thread. That pattern is not self-contradictory. My doubts > about its usefulness have more to do with it being evidence of > the SQL anti-pattern of treating NULL as a normal data value. It's just an example, chosen to be easy to explain. I guess you didn't like that example, so I'll go with another: What about the "general OR optimization" stuff described by the MDAM paper? The redundant and contradictory qual stuff is needed there, to make sure that the final index scan access path (consisting of a series of disjunctive index scans in key space order) will reliably avoid returning duplicate rows. (FWIW I think that the OP took some cues from the MDAM paper, since they talked about doing things like transforming SQL Standard row value constructor expressions into disjuncts.) I don't have a concrete proposal here, or anything like it. I just want to express that I believe that there's a lack of joined-up thinking about nbtree preprocessing and the optimizer (not for the first time). We *are* missing a few interesting tricks here. -- Peter Geoghegan
On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > BTW, it's also worth pointing out the constraint_exclusion GUC. > If that's turned up to "on" from its default setting of "partition", > it authorizes the planner to spend more effort looking for > contradictory quals. It might be reasonable to gate some of these > checks with that. (I also can't help wondering if any of them are > already implemented but are hidden behind that. I think Robert > must have had constraint_exclusion = on for his examples a > couple messages back.) I definitely didn't change the value of that setting. It's possible I was running on an older branch, but I don't think it was anything ancient. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... I think Robert >> must have had constraint_exclusion = on for his examples a >> couple messages back.) > I definitely didn't change the value of that setting. It's possible I > was running on an older branch, but I don't think it was anything > ancient. Hmm, constraint_exclusion has defaulted to "partition" for many years now. But when I tried to reproduce your examples at [1]: regression=# create table foo (a int); CREATE TABLE regression=# explain select * from foo where a < 1 and a > 1; QUERY PLAN ----------------------------------------------------- Seq Scan on foo (cost=0.00..48.25 rows=13 width=4) Filter: ((a < 1) AND (a > 1)) (2 rows) regression=# show constraint_exclusion; constraint_exclusion ---------------------- partition (1 row) regression=# set constraint_exclusion = on; SET regression=# explain select * from foo where a < 1 and a > 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) and similarly for some of the other examples (I didn't try every one). Maybe your test table was partitioned?? regards, tom lane [1] https://www.postgresql.org/message-id/CA%2BTgmoZqiCwHbZczXXLjucfuHi%3D7EahSyzEj5yrqYKMQ0QOL9Q%40mail.gmail.com
On Tue, Nov 26, 2024 at 12:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maybe your test table was partitioned?? Ah, yes, it was. -- Robert Haas EDB: http://www.enterprisedb.com
Thank you for your advice and guidance.
I didn't know the constraint_exclusion switch existed.
As your advice i need to make what i am doing to be clear.
> There are cases where we don't already draw the necessary conclusions,
> such as a>1 and a>2, which could be simplified to a>2. But those cases
> aren't necessarily that common.
The path i committed not just test contradictory but also do the
simplification. The simplification is limited in the BTREE.
simplification. The simplification is limited in the BTREE.
Could you interpret the case in a little more detail.
Best regards
From: Robert Haas <robertmhaas@gmail.com>
Sent: Tuesday, November 26, 2024 04:55
To: ro b <bigbro_wq@hotmail.com>
Cc: pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
Sent: Tuesday, November 26, 2024 04:55
To: ro b <bigbro_wq@hotmail.com>
Cc: pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote:
> 1. Background
> A few months ago, when i read source codes of B-tree in routine
> _bt_preprocess_keys, i found that there are more contradictory
> checking case we can add. I sent email to pgsql-hackers and
> then community contributor replied me and told me someone had
> already proposed this question. Thanks for taking the time
> to address my question. After serveral conversations, i found
> that we can do something more. We can place these jobs at planning time.
When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".
I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:
robert.haas=# explain select * from foo where a < 1 and a > 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a < 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a <> 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.
> 7) Scalar array comparison expression
> First we need to deconstruct the const array, figure out the null and non-null
> elements.
> If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
> ALL(array[56, null])), it's contradictory.
True, but that already seems to be working:
robert.haas=# explain select * from foo where a <= all(array[56, null]);
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).
--
Robert Haas
EDB: http://www.enterprisedb.com
> 1. Background
> A few months ago, when i read source codes of B-tree in routine
> _bt_preprocess_keys, i found that there are more contradictory
> checking case we can add. I sent email to pgsql-hackers and
> then community contributor replied me and told me someone had
> already proposed this question. Thanks for taking the time
> to address my question. After serveral conversations, i found
> that we can do something more. We can place these jobs at planning time.
When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".
I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:
robert.haas=# explain select * from foo where a < 1 and a > 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a < 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a <> 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.
> 7) Scalar array comparison expression
> First we need to deconstruct the const array, figure out the null and non-null
> elements.
> If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
> ALL(array[56, null])), it's contradictory.
True, but that already seems to be working:
robert.haas=# explain select * from foo where a <= all(array[56, null]);
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote: > The path i committed not just test contradictory but also do the > simplification. The simplification is limited in the BTREE. > Could you interpret the case in a little more detail. I am not able to understand this paragraph. Regretfully, -- Robert Haas EDB: http://www.enterprisedb.com
I mean why can't draw the conclusions
like the case a>1 and a>2 which can be simplified
to a>2 if the operator > is btree opclass.
Best regards
From: Robert Haas <robertmhaas@gmail.com>
Sent: Wednesday, November 27, 2024 22:52
To: ro b <bigbro_wq@hotmail.com>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
Sent: Wednesday, November 27, 2024 22:52
To: ro b <bigbro_wq@hotmail.com>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote:
> The path i committed not just test contradictory but also do the
> simplification. The simplification is limited in the BTREE.
> Could you interpret the case in a little more detail.
I am not able to understand this paragraph.
Regretfully,
--
Robert Haas
EDB: http://www.enterprisedb.com
> The path i committed not just test contradictory but also do the
> simplification. The simplification is limited in the BTREE.
> Could you interpret the case in a little more detail.
I am not able to understand this paragraph.
Regretfully,
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Nov 27, 2024 at 10:28 AM ro b <bigbro_wq@hotmail.com> wrote: > I mean why can't draw the conclusions > like the case a>1 and a>2 which can be simplified > to a>2 if the operator > is btree opclass. This question has already been discussed in some detail on this email thread. First, we sometimes do detect it, as discussed by Peter. Second, it is an unusual case and so not worth spending a lot of CPU cycles to detect, as discussed by Tom. We might still accept a patch to improve things in this area. For example, if somebody could find a way to change the code so that we are able to detect more cases without needing to spend more CPU cycles, I think everyone would like that. However, that may be a tricky goal to accomplish. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Nov 27, 2024 at 10:28 AM ro b <bigbro_wq@hotmail.com> wrote: >> I mean why can't draw the conclusions >> like the case a>1 and a>2 which can be simplified >> to a>2 if the operator > is btree opclass. > This question has already been discussed in some detail on this email > thread. First, we sometimes do detect it, as discussed by Peter. > Second, it is an unusual case and so not worth spending a lot of CPU > cycles to detect, as discussed by Tom. > We might still accept a patch to improve things in this area. For > example, if somebody could find a way to change the code so that we > are able to detect more cases without needing to spend more CPU > cycles, I think everyone would like that. However, that may be a > tricky goal to accomplish. This discussion trailed off without any final resolution about what to do with the patch submission. I think we owe the submitter at least a minimal actual review, so ... I took a quick look at the "self_contradictory.sql" regression test included in the patch, with an eye to seeing how much of it we pass already when constraint_exclusion is set to "on". The answer is "a fair amount", in particular we successfully detect many of the cases where the clauses can be proved contradictory. The ones where we fail to do so are mostly cases like explain (costs off) SELECT * FROM self_a WHERE z IS UNKNOWN AND z IS NOT UNKNOWN; - QUERY PLAN --------------------------- - Result - One-Time Filter: false + QUERY PLAN +--------------------------------------------------- + Seq Scan on self_a + Filter: ((z IS UNKNOWN) AND (z IS NOT UNKNOWN)) (2 rows) This is not a fundamental shortcoming in the system, it's just that optimizer/util/predtest.c has not been built out very far with respect to BooleanTest nodes. We could certainly do that, and it wouldn't add much runtime for queries that contain no BooleanTest conditions, but I have to question whether it's worth the trouble. Is a query like the above really common? There are also a bunch of cases that are not contradictory but that the patch manages to simplify, for example explain (costs off) SELECT * FROM self_a WHERE a <= 10 AND a = 10; - QUERY PLAN --------------------- + QUERY PLAN +------------------------------------ Seq Scan on self_a - Filter: (a = 10) + Filter: ((a <= 10) AND (a = 10)) (2 rows) The constraint_exclusion framework can't make this sort of transformation because it is only focused on proving a relation's quals to be contradictory or not. So this part is new work ... but again I have to wonder if this query pattern is common enough to be worth spending planner cycles to check for. As Peter mentioned, we do actually have logic that performs the equivalent transformation within btree indexqual setup. However, the context is pretty different: first, the work of identifying clauses that bear on the same variable has already been done, and second we really have to do at least some of this work on the way to identifying the appropriate initial search condition for btree descent. It's not apparent to me that it's worth having a whole new planner pass that is searching for clauses involving the same variable. That is not especially cheap if there are a lot of clauses. If we were to pursue this, I'd want to think about somehow integrating it with the work that indxpath.c already does to collect index-relevant conditions. So in short, my main review comment is that you should be looking to integrate with existing planner functionality such as predtest.c and indxpath.c, rather than writing a few thousand lines of not-connected-to-anything-else code. It's too hard to make the case that these optimizations are worth the cost if they require an entire separate analysis pass. Some other comments: * You need to work harder on the comments in whatever you write. contradictory.c is drastically undercommented, and what there is is frequently unhelpful, like the obviously copied-and-pasted file header. You also did things like adding entirely-undocumented fields to common structures such as RestrictInfo. This certainly wouldn't be committable as-is; but it also makes the patch painful to review in any detail, since the reviewer must reverse-engineer a lot of information that should have been provided by comments. * The side-effects on existing regression test cases are pretty concerning: it's not clear that those tests still exercise what they were meant to. (You can be sure that a test case intentionally written with redundant conditions was not written that way just to provide a case you could optimize later.) We'd have to analyze all those test cases in detail and figure out whether the simplification is OK or we need to provide some way to defeat it. I'm going to mark this submission Returned With Feedback, because I don't foresee anything very close to this getting committed. But I'd encourage you to pursue the ideas in other forms as suggested by this discussion. regards, tom lane
I wrote: > This is not a fundamental shortcoming in the system, it's just > that optimizer/util/predtest.c has not been built out very far > with respect to BooleanTest nodes. Oh ... I'd momentarily forgotten that there's already a patch in the queue to attack that: https://commitfest.postgresql.org/51/4690/ That's been stalled for lack of review, so if you were interested in helping out, that'd be great. regards, tom lane