Обсуждение: Index range search optimization
Hi hackers.
_bt_readpage performs key check for each item on the page trying to locate upper boundary.
While comparison of simple integer keys are very fast, comparison of long strings can be quite expensive.
We can first make check for the largest key on the page and if it is not larger than upper boundary, then skip checks for all elements.
At this quite artificial example such optimization gives 3x time speed-up:
_bt_readpage performs key check for each item on the page trying to locate upper boundary.
While comparison of simple integer keys are very fast, comparison of long strings can be quite expensive.
We can first make check for the largest key on the page and if it is not larger than upper boundary, then skip checks for all elements.
At this quite artificial example such optimization gives 3x time speed-up:
create table t(t text primary key); insert into t values ('primary key-'||generate_series(1,10000000)::text); select count(*) from t where t between 'primary key-1000000' and 'primary key-2000000';
At my notebook with large enough shared buffers and disabled concurrency the difference is 83 vs. 247 msec
For integer keys the difference is much smaller: 69 vs. 82 msec
Certainly I realized that this example is quite exotic: most of DBAs prefer integer keys and such large ranges are quite rare.
But still such large range queries are used.
And I have checked that the proposed patch doesn't cause slowdown of exact search.
For integer keys the difference is much smaller: 69 vs. 82 msec
Certainly I realized that this example is quite exotic: most of DBAs prefer integer keys and such large ranges are quite rare.
But still such large range queries are used.
And I have checked that the proposed patch doesn't cause slowdown of exact search.
Вложения
Hi!
On Fri, Jun 23, 2023 at 10:36 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
_bt_readpage performs key check for each item on the page trying to locate upper boundary.
While comparison of simple integer keys are very fast, comparison of long strings can be quite expensive.
We can first make check for the largest key on the page and if it is not larger than upper boundary, then skip checks for all elements.
At this quite artificial example such optimization gives 3x time speed-up:create table t(t text primary key); insert into t values ('primary key-'||generate_series(1,10000000)::text); select count(*) from t where t between 'primary key-1000000' and 'primary key-2000000';At my notebook with large enough shared buffers and disabled concurrency the difference is 83 vs. 247 msec
For integer keys the difference is much smaller: 69 vs. 82 msec
Certainly I realized that this example is quite exotic: most of DBAs prefer integer keys and such large ranges are quite rare.
But still such large range queries are used.
And I have checked that the proposed patch doesn't cause slowdown of exact search.
Neat optimization! But I wonder if we could do even better. The attached patch allows Postgres to skip scan keys required for directional scans (even when other keys are present in the scan). I'll soon post the testing results and a more polished version of this patch.
Regards,
Alexander Korotkov
Вложения
On Thu, Sep 14, 2023 at 3:23 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > The attached patch allows Postgres to skip scan keys required for directional scans (even when other keys are present inthe scan). I'll soon post the testing results and a more polished version of this patch. This is very interesting to me, partly because it seems related to my ongoing work on SAOP execution within nbtree. My patch gives _bt_readpage and particularly _bt_checkkeys more high-level context, which they use to intelligently control the scan. That enables us to dynamically decide whether we should now perform another descent of the index via another call to _bt_first, or if we should prefer to continue on the leaf level for now. Maybe we will match many distinct sets of array keys on the same leaf page, in the same call to _bt_readpage. We don't want to miss out on such opportunities, but we also want to quickly notice when we're on a page where matching any more array keys is just hopeless. There is a need to keep these two things in balance. We need to notice the hopeless cases before wasting too many cycles on it. That creates a practical need to do an early precheck of the high key (roughly the same check that we do already). If the high key indicates that continuing on this page is truly hopeless, then we should give up and do another primitive index scan -- _bt_first will reposition us onto the leaf page that we need to go to next, which is (hopefully) far away from the leaf page we started on. Your patch therefore has the potential to help my own patch. But, it also has some potential to conflict with it, because my patch makes the meaning of SK_BT_REQFWD and SK_BT_REQBKWD more complicated (though only in cases where we have SK_SEARCHARRAY scan keys). I'm sure that this can be managed sensibly, though. Some feedback on your patch: * I notice that you're not using the high key for this, even in a forward scan -- you're using the last non-pivot tuple on the page. Why is that? (I have some idea why, actually, but I'd like to hear your thoughts first.) * Separately, I don't think that it makes sense to use the same requiredDirMatched value (which came from the last non-pivot tuple on the page) when the special _bt_checkkeys call for the high key takes place. I don't think that this will lead to wrong answers, but it's weird, and is likely to defeat the existing optimization in some important cases. Due to the influence of suffix truncation, it's relatively likely that the most significant column in the high key will be different to the corresponding value from the last few non-pivot tuples on the page -- the high key tends to be "aligned with natural boundaries in the key space", and so "gives us a good preview of the right sibling page". We don't want to treat it the same as non-pivot tuples here, because it's quite different, in ways that are subtle but still important. * I would avoid using the terminology "preprocess scan keys" for this. That exact terminology is already used to describe what _bt_preprocess_keys() does. That function is actually involved in Konstantin's patch, so that could be very confusing. When we "preprocess" a scan key, it outputs a processed scan key with markings such as the required markings that you're using in the patch -- it's something that acts on/changes the scan keys themselves. Whereas your patch is exploiting information from already-processed scan keys, by applying it to the key space of a page. I suggest calling it "prechecking the page", or something like that. I don't feel very strongly about what you call it, provided it isn't confusing or ambiguous. -- Peter Geoghegan
Hi, Peter! Thank you for your interest in this patch. On Tue, Sep 19, 2023 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Sep 14, 2023 at 3:23 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > The attached patch allows Postgres to skip scan keys required for directional scans (even when other keys are presentin the scan). I'll soon post the testing results and a more polished version of this patch. > > This is very interesting to me, partly because it seems related to my > ongoing work on SAOP execution within nbtree. > > My patch gives _bt_readpage and particularly _bt_checkkeys more > high-level context, which they use to intelligently control the scan. > That enables us to dynamically decide whether we should now perform > another descent of the index via another call to _bt_first, or if we > should prefer to continue on the leaf level for now. Maybe we will > match many distinct sets of array keys on the same leaf page, in the > same call to _bt_readpage. We don't want to miss out on such > opportunities, but we also want to quickly notice when we're on a page > where matching any more array keys is just hopeless. > > There is a need to keep these two things in balance. We need to notice > the hopeless cases before wasting too many cycles on it. That creates > a practical need to do an early precheck of the high key (roughly the > same check that we do already). If the high key indicates that > continuing on this page is truly hopeless, then we should give up and > do another primitive index scan -- _bt_first will reposition us onto > the leaf page that we need to go to next, which is (hopefully) far > away from the leaf page we started on. This is a pretty neat optimization indeed! > Your patch therefore has the potential to help my own patch. But, it > also has some potential to conflict with it, because my patch makes > the meaning of SK_BT_REQFWD and SK_BT_REQBKWD more complicated (though > only in cases where we have SK_SEARCHARRAY scan keys). I'm sure that > this can be managed sensibly, though. OK! Let me know if you feel that I need to change something in this patch to lower the potential conflict. > Some feedback on your patch: > > * I notice that you're not using the high key for this, even in a > forward scan -- you're using the last non-pivot tuple on the page. Why > is that? (I have some idea why, actually, but I'd like to hear your > thoughts first.) I'm using the last non-pivot tuple on the page instead of hikey because it's lower than hikey. As you stated below, the most significant column in the hikey is likely different from that of the last non-pivot tuple. So, it's more likely to use the optimization with the last non-pivot tuple. > * Separately, I don't think that it makes sense to use the same > requiredDirMatched value (which came from the last non-pivot tuple on > the page) when the special _bt_checkkeys call for the high key takes > place. I don't think that this will lead to wrong answers, but it's > weird, and is likely to defeat the existing optimization in some > important cases. > > Due to the influence of suffix truncation, it's relatively likely that > the most significant column in the high key will be different to the > corresponding value from the last few non-pivot tuples on the page -- > the high key tends to be "aligned with natural boundaries in the key > space", and so "gives us a good preview of the right sibling page". We > don't want to treat it the same as non-pivot tuples here, because it's > quite different, in ways that are subtle but still important. This definitely makes sense. I've removed the usage of requiredDirMatched from this _bt_checkkeys() call. > * I would avoid using the terminology "preprocess scan keys" for this. > That exact terminology is already used to describe what > _bt_preprocess_keys() does. > > That function is actually involved in Konstantin's patch, so that > could be very confusing. When we "preprocess" a scan key, it outputs a > processed scan key with markings such as the required markings that > you're using in the patch -- it's something that acts on/changes the > scan keys themselves. Whereas your patch is exploiting information > from already-processed scan keys, by applying it to the key space of a > page. > > I suggest calling it "prechecking the page", or something like that. I > don't feel very strongly about what you call it, provided it isn't > confusing or ambiguous. This also makes sense. I've rephrased the comment. ------ Regards, Alexander Korotkov
Вложения
On Wed, Sep 20, 2023 at 5:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Tue, Sep 19, 2023 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > This also makes sense. I've rephrased the comment. The revised patch is attached. It contains better comments and the commit message. Peter, could you please check if you're OK with this? ------ Regards, Alexander Korotkov
Вложения
On Thu, 21 Sept 2023 at 15:17, Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Wed, Sep 20, 2023 at 5:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Tue, Sep 19, 2023 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > > This also makes sense. I've rephrased the comment. > > The revised patch is attached. It contains better comments and the > commit message. Peter, could you please check if you're OK with this? Hi, Alexander! I looked at the patch code and I agree with this optimization. Implementation also looks good to me except change : + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) && + !(key->sk_flags & SK_ROW_HEADER)) + requiredDir = true; ... - if ((key->sk_flags & SK_BT_REQFWD) && - ScanDirectionIsForward(dir)) - *continuescan = false; - else if ((key->sk_flags & SK_BT_REQBKWD) && - ScanDirectionIsBackward(dir)) + if (requiredDir) *continuescan = false; looks like changing behavior in the case when key->sk_flags & SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) && (!requiredDirMatched) Originally it doesn't set *continuescan = false; and with the patch it will set. This may be relevant for the first page when requiredDirMatched is intentionally skipped to be set and for call _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false); Maybe I missed something and this can not appear for some reason? Also naming of requiredDirMatched and requiredDir seems semantically hard to understand the meaning without looking at the patch commit message. But I don't have better proposals yet, so maybe it's acceptable. Kind regards, Pavel Borisov Supabase.
On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > I looked at the patch code and I agree with this optimization. > Implementation also looks good to me except change : > + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) && > + !(key->sk_flags & SK_ROW_HEADER)) > + requiredDir = true; > ... > - if ((key->sk_flags & SK_BT_REQFWD) && > - ScanDirectionIsForward(dir)) > - *continuescan = false; > - else if ((key->sk_flags & SK_BT_REQBKWD) && > - ScanDirectionIsBackward(dir)) > + if (requiredDir) > *continuescan = false; > > looks like changing behavior in the case when key->sk_flags & > SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) && > (!requiredDirMatched) > Originally it doesn't set *continuescan = false; and with the patch it will set. I agree that this is a problem. Inequality strategy scan keys are used when the initial positioning strategy used by _bt_first (for its _bt_search call) is based on an operator other than the "=" operator for the opclass. These scan keys are required in one direction only (Konstantin's original patch just focussed on these cases, actually). Obviously, that difference matters. I don't think that this patch should do anything that even looks like it might be revising the formal definition of "required in the current scan direction". Why is SK_ROW_HEADER treated as a special case by the patch? Could it be related to the issues with required-ness and scan direction? Note that we never use BTEqualStrategyNumber for SK_ROW_HEADER scan key row comparisons, so they're only ever required for one scan direction. (Equality-type row constructor syntax can of course be used without preventing the system from using an index scan, but the nbtree code will not see that case as a row comparison in the first place. This is due to preprocessing by the planner -- nbtree just sees conventional scan keys with multiple simple equality scan keys with = row comparisons.) Also, what about NULLs? While "key IS NULL" is classified as an equality check (see _bt_preprocess_keys comments), the same isn't true with "key IS NOT NULL". The latter case usually has scan key flags "SK_ISNULL | SK_SEARCHNOTNULL | SK_BT_REQFWD" -- there is no SK_BT_REQBKWD here. > This may be relevant for the first page when requiredDirMatched is > intentionally skipped to be set and for call > _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false); Also, requiredDirMatched isn't initialized by _bt_readpage() when "so->firstPage". Shouldn't it be initialized to false? Also, don't we need to take more care with a fully empty page? The "if (!so->firstPage) ... " block should be gated using a condition such as "if (!so->firstPage && minoff < maxoff)". (Adding a "minoff <= maxoff" test would also work, but then the optimization will get applied on pages with only one non-pivot tuple. That would be harmless, but a waste of cycles.) > Also naming of requiredDirMatched and requiredDir seems semantically > hard to understand the meaning without looking at the patch commit > message. But I don't have better proposals yet, so maybe it's > acceptable. I agree. How about "requiredMatchedByPrecheck" instead of "requiredDirMatched", and "required" instead of "requiredDir"? It would be nice if this patch worked in a way that could be verified by an assertion. Under this scheme, the optimization would only really be used in release builds (builds without assertions enabled, really). We'd only verify that the optimized case agreed with the slow path in assert-enabled builds. It might also make sense to always "apply the optimization" on assert-enabled builds, even for the first page seen by _bt_readpage by any _bt_first-wise scan. Maybe this sort of approach is impractical here for some reason, but I don't see why it should be. Obviously, the optimization should lower the amount of work in some calls to _bt_checkkeys, without ever changing the answer _bt_checkkeys gives. Ideally, it should be done in a way that makes that very obvious. There are some very subtle interactions between _bt_checkkeys and other, distant code -- which makes me feel paranoid. Notably, with required equality strategy scan keys, we're crucially dependent on _bt_first using an equality strategy for its initial positioning call to _bt_search. This is described by comments in both _bt_checkkeys and in _bt_first. Note, in particular, that it is essential that the initial offnum passed to _bt_readpage doesn't allow a call to _bt_checkkeys to take place that could cause it to become confused by a required equality strategy scan key, leading to _bt_checkkeys terminating the whole scan "early" -- causing wrong answers. For a query "WHERE foo = 5" (and a forward scan), we had better not pass _bt_readpage an offset number for a tuple with "foo" value 4. If that is ever allowed then _bt_checkkeys will terminate the scan immediately, leading to wrong answers. All because _bt_checkkeys can't tell if 4 comes before 5 or comes after 5 -- it only has an "=" operator to work with, so it can't actually make this distinction, so it likes to assume that anything != 5 must come after 5 (or before 5 during a backwards scan). I added a very similar _bt_compare()-based assertion in _bt_check_unique(), which went on to catch a very subtle bug in the Postgres 12 nbtree work -- the bug fixed by commit 74eb2176bf. So I have put this particular idea about asserting agreement between a fast path and a slow comparison path into practice already. -- Peter Geoghegan
On Fri, 22 Sept 2023 at 00:48, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > I looked at the patch code and I agree with this optimization. > > Implementation also looks good to me except change : > > + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) && > > + !(key->sk_flags & SK_ROW_HEADER)) > > + requiredDir = true; > > ... > > - if ((key->sk_flags & SK_BT_REQFWD) && > > - ScanDirectionIsForward(dir)) > > - *continuescan = false; > > - else if ((key->sk_flags & SK_BT_REQBKWD) && > > - ScanDirectionIsBackward(dir)) > > + if (requiredDir) > > *continuescan = false; > > > > looks like changing behavior in the case when key->sk_flags & > > SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) && > > (!requiredDirMatched) > > Originally it doesn't set *continuescan = false; and with the patch it will set. > > I agree that this is a problem. Inequality strategy scan keys are used > when the initial positioning strategy used by _bt_first (for its > _bt_search call) is based on an operator other than the "=" operator > for the opclass. These scan keys are required in one direction only > (Konstantin's original patch just focussed on these cases, actually). > Obviously, that difference matters. I don't think that this patch > should do anything that even looks like it might be revising the > formal definition of "required in the current scan direction". I think it's the simplification that changed code behavior - just an overlook and this could be fixed easily. > Also, requiredDirMatched isn't initialized by _bt_readpage() when > "so->firstPage". Shouldn't it be initialized to false? True. > > Also naming of requiredDirMatched and requiredDir seems semantically > > hard to understand the meaning without looking at the patch commit > > message. But I don't have better proposals yet, so maybe it's > > acceptable. > > I agree. How about "requiredMatchedByPrecheck" instead of > "requiredDirMatched", and "required" instead of "requiredDir"? For me, the main semantic meaning is omitted and even more unclear, i.e. what exactly required and matched. I'd suppose scanDirRequired, scanDirMatched, but feel it's not ideal either. Or maybe trySkipRange, canSkipRange etc. Regards, Pavel Borisov, Supabase.
Hi Peter, Hi Pavel, The v4 of the patch is attached. On Thu, Sep 21, 2023 at 11:48 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > I looked at the patch code and I agree with this optimization. > > Implementation also looks good to me except change : > > + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) && > > + !(key->sk_flags & SK_ROW_HEADER)) > > + requiredDir = true; > > ... > > - if ((key->sk_flags & SK_BT_REQFWD) && > > - ScanDirectionIsForward(dir)) > > - *continuescan = false; > > - else if ((key->sk_flags & SK_BT_REQBKWD) && > > - ScanDirectionIsBackward(dir)) > > + if (requiredDir) > > *continuescan = false; > > > > looks like changing behavior in the case when key->sk_flags & > > SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) && > > (!requiredDirMatched) > > Originally it doesn't set *continuescan = false; and with the patch it will set. > > I agree that this is a problem. Inequality strategy scan keys are used > when the initial positioning strategy used by _bt_first (for its > _bt_search call) is based on an operator other than the "=" operator > for the opclass. These scan keys are required in one direction only > (Konstantin's original patch just focussed on these cases, actually). > Obviously, that difference matters. I don't think that this patch > should do anything that even looks like it might be revising the > formal definition of "required in the current scan direction". Sorry, that was messed up from various attempts to write the patch. Actually, I end up with two boolean variables indicating whether the current key is required for the same direction or opposite direction scan. I believe that the key required for the opposite direction scan should be already satisfied by _bt_first() except for NULLs case. I've implemented a skip of calling the key function for this case (with assert that result is the same). > Why is SK_ROW_HEADER treated as a special case by the patch? Could it > be related to the issues with required-ness and scan direction? Note > that we never use BTEqualStrategyNumber for SK_ROW_HEADER scan key row > comparisons, so they're only ever required for one scan direction. > (Equality-type row constructor syntax can of course be used without > preventing the system from using an index scan, but the nbtree code > will not see that case as a row comparison in the first place. This is > due to preprocessing by the planner -- nbtree just sees conventional > scan keys with multiple simple equality scan keys with = row > comparisons.) The thing is that NULLs could appear in the middle of matching values. # WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a')) SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b); a | b | ?column? ---+------+---------- a | b | t a | NULL | NULL b | a | t (3 rows) So we can't just skip the row comparison operator, because we can meet NULL at any place. > > This may be relevant for the first page when requiredDirMatched is > > intentionally skipped to be set and for call > > _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false); > > Also, requiredDirMatched isn't initialized by _bt_readpage() when > "so->firstPage". Shouldn't it be initialized to false? > > Also, don't we need to take more care with a fully empty page? The "if > (!so->firstPage) ... " block should be gated using a condition such as > "if (!so->firstPage && minoff < maxoff)". (Adding a "minoff <= maxoff" > test would also work, but then the optimization will get applied on > pages with only one non-pivot tuple. That would be harmless, but a > waste of cycles.) This makes sense. I've added (minoff < maxoff) to the condition. > > Also naming of requiredDirMatched and requiredDir seems semantically > > hard to understand the meaning without looking at the patch commit > > message. But I don't have better proposals yet, so maybe it's > > acceptable. > > I agree. How about "requiredMatchedByPrecheck" instead of > "requiredDirMatched", and "required" instead of "requiredDir"? > > It would be nice if this patch worked in a way that could be verified > by an assertion. Under this scheme, the optimization would only really > be used in release builds (builds without assertions enabled, really). > We'd only verify that the optimized case agreed with the slow path in > assert-enabled builds. It might also make sense to always "apply the > optimization" on assert-enabled builds, even for the first page seen > by _bt_readpage by any _bt_first-wise scan. Maybe this sort of > approach is impractical here for some reason, but I don't see why it > should be. Yes, this makes sense. I've added an assert check that results are the same as with requiredMatchedByPrecheck == false. > Obviously, the optimization should lower the amount of work in some > calls to _bt_checkkeys, without ever changing the answer _bt_checkkeys > gives. Ideally, it should be done in a way that makes that very > obvious. There are some very subtle interactions between _bt_checkkeys > and other, distant code -- which makes me feel paranoid. Notably, with > required equality strategy scan keys, we're crucially dependent on > _bt_first using an equality strategy for its initial positioning call > to _bt_search. This is described by comments in both _bt_checkkeys and > in _bt_first. > > Note, in particular, that it is essential that the initial offnum > passed to _bt_readpage doesn't allow a call to _bt_checkkeys to take > place that could cause it to become confused by a required equality > strategy scan key, leading to _bt_checkkeys terminating the whole scan > "early" -- causing wrong answers. For a query "WHERE foo = 5" (and a > forward scan), we had better not pass _bt_readpage an offset number > for a tuple with "foo" value 4. If that is ever allowed then > _bt_checkkeys will terminate the scan immediately, leading to wrong > answers. All because _bt_checkkeys can't tell if 4 comes before 5 or > comes after 5 -- it only has an "=" operator to work with, so it can't > actually make this distinction, so it likes to assume that anything != > 5 must come after 5 (or before 5 during a backwards scan). > > I added a very similar _bt_compare()-based assertion in > _bt_check_unique(), which went on to catch a very subtle bug in the > Postgres 12 nbtree work -- the bug fixed by commit 74eb2176bf. So I > have put this particular idea about asserting agreement between a fast > path and a slow comparison path into practice already. Good, thank you for the detailed clarification. ------ Regards, Alexander Korotkov
Вложения
Hi, Alexander! I found and fixed a couple of naming issues that came to v4 from earlier patches. Also, I added initialization of requiredMatchedByPrecheck in case of first page. Please see patch v5. One more doubt about naming. Calling function _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts, ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck) as (void) _bt_checkkeys(scan, itup, indnatts, dir, &requiredMatchedByPrecheck, false); looks little bit misleading because of coincidence of names of 5 and 6 arguments.
Вложения
Sorry, I've mistaken with attached version previously. Correct v5 attached. On Mon, 25 Sept 2023 at 13:58, Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > Hi, Alexander! > > I found and fixed a couple of naming issues that came to v4 from > earlier patches. > Also, I added initialization of requiredMatchedByPrecheck in case of first page. > > Please see patch v5. > > One more doubt about naming. Calling function > _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts, > ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck) > as > (void) _bt_checkkeys(scan, itup, indnatts, dir, > &requiredMatchedByPrecheck, false); > looks little bit misleading because of coincidence of names of 5 and 6 > arguments.
Вложения
On Mon, Sep 25, 2023 at 12:58 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > One more doubt about naming. Calling function > _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts, > ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck) > as > (void) _bt_checkkeys(scan, itup, indnatts, dir, > &requiredMatchedByPrecheck, false); > looks little bit misleading because of coincidence of names of 5 and 6 > arguments. I've added the comment clarifying this argument usage. ------ Regards, Alexander Korotkov
Вложения
On Mon, Sep 25, 2023 at 1:18 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Sep 25, 2023 at 12:58 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> One more doubt about naming. Calling function
> _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
> ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck)
> as
> (void) _bt_checkkeys(scan, itup, indnatts, dir,
> &requiredMatchedByPrecheck, false);
> looks little bit misleading because of coincidence of names of 5 and 6
> arguments.
I've added the comment clarifying this argument usage.
Fixed typo inficating => indicating as pointed by Pavel.
Peter, what do you think about the current shape of the patch?
------
Regards,
Alexander Korotkov
Вложения
On Wed, Sep 27, 2023 at 9:41 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Fixed typo inficating => indicating as pointed by Pavel. > Peter, what do you think about the current shape of the patch? I'll try to get to this tomorrow. I'm rather busy with moving home at the moment, unfortunately. -- Peter Geoghegan
On Thu, Sep 28, 2023 at 5:21 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Wed, Sep 27, 2023 at 9:41 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > Fixed typo inficating => indicating as pointed by Pavel. > > Peter, what do you think about the current shape of the patch? > > I'll try to get to this tomorrow. I'm rather busy with moving home at > the moment, unfortunately. No problem, thank you! ------ Regards, Alexander Korotkov
On Fri, Sep 22, 2023 at 7:24 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > The thing is that NULLs could appear in the middle of matching values. > > # WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a')) > SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b); > a | b | ?column? > ---+------+---------- > a | b | t > a | NULL | NULL > b | a | t > (3 rows) > > So we can't just skip the row comparison operator, because we can meet > NULL at any place. But why would SK_ROW_HEADER be any different? Is it related to this existing case inside _bt_check_rowcompare()?: if (subkey->sk_flags & SK_ISNULL) { /* * Unlike the simple-scankey case, this isn't a disallowed case. * But it can never match. If all the earlier row comparison * columns are required for the scan direction, we can stop the * scan, because there can't be another tuple that will succeed. */ if (subkey != (ScanKey) DatumGetPointer(skey->sk_argument)) subkey--; if ((subkey->sk_flags & SK_BT_REQFWD) && ScanDirectionIsForward(dir)) *continuescan = false; else if ((subkey->sk_flags & SK_BT_REQBKWD) && ScanDirectionIsBackward(dir)) *continuescan = false; return false; } I noticed that you're not initializing so->firstPage correctly for the _bt_endpoint() path, which is used when the initial position of the scan is either the leftmost or rightmost page. That is, it's possible to reach _bt_readpage() without having reached the point in _bt_first() where you initialize so->firstPage to "true". It would probably make sense if the flag was initialized to "false" in the same way as most other scan state is already, somewhere in nbtree.c. Probably in btrescan(). -- Peter Geoghegan
Hi, Peter. On Fri, Sep 29, 2023 at 4:57 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Fri, Sep 22, 2023 at 7:24 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > The thing is that NULLs could appear in the middle of matching values. > > > > # WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a')) > > SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b); > > a | b | ?column? > > ---+------+---------- > > a | b | t > > a | NULL | NULL > > b | a | t > > (3 rows) > > > > So we can't just skip the row comparison operator, because we can meet > > NULL at any place. > > But why would SK_ROW_HEADER be any different? Is it related to this > existing case inside _bt_check_rowcompare()?: > > if (subkey->sk_flags & SK_ISNULL) > { > /* > * Unlike the simple-scankey case, this isn't a disallowed case. > * But it can never match. If all the earlier row comparison > * columns are required for the scan direction, we can stop the > * scan, because there can't be another tuple that will succeed. > */ > if (subkey != (ScanKey) DatumGetPointer(skey->sk_argument)) > subkey--; > if ((subkey->sk_flags & SK_BT_REQFWD) && > ScanDirectionIsForward(dir)) > *continuescan = false; > else if ((subkey->sk_flags & SK_BT_REQBKWD) && > ScanDirectionIsBackward(dir)) > *continuescan = false; > return false; > } Yes, exactly. Our row comparison operators don't match if there is any null inside the row. But you can find these rows within the matching range. > I noticed that you're not initializing so->firstPage correctly for the > _bt_endpoint() path, which is used when the initial position of the > scan is either the leftmost or rightmost page. That is, it's possible > to reach _bt_readpage() without having reached the point in > _bt_first() where you initialize so->firstPage to "true". Good catch, thank you! > It would probably make sense if the flag was initialized to "false" in > the same way as most other scan state is already, somewhere in > nbtree.c. Probably in btrescan(). Makes sense, initialisation is added. ------ Regards, Alexander Korotkov
Вложения
Hi! On Fri, 29 Sept 2023 at 10:35, Alexander Korotkov <aekorotkov@gmail.com> wrote: > > Hi, Peter. > > On Fri, Sep 29, 2023 at 4:57 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Fri, Sep 22, 2023 at 7:24 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > The thing is that NULLs could appear in the middle of matching values. > > > > > > # WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a')) > > > SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b); > > > a | b | ?column? > > > ---+------+---------- > > > a | b | t > > > a | NULL | NULL > > > b | a | t > > > (3 rows) > > > > > > So we can't just skip the row comparison operator, because we can meet > > > NULL at any place. > > > > But why would SK_ROW_HEADER be any different? Is it related to this > > existing case inside _bt_check_rowcompare()?: > > > > if (subkey->sk_flags & SK_ISNULL) > > { > > /* > > * Unlike the simple-scankey case, this isn't a disallowed case. > > * But it can never match. If all the earlier row comparison > > * columns are required for the scan direction, we can stop the > > * scan, because there can't be another tuple that will succeed. > > */ > > if (subkey != (ScanKey) DatumGetPointer(skey->sk_argument)) > > subkey--; > > if ((subkey->sk_flags & SK_BT_REQFWD) && > > ScanDirectionIsForward(dir)) > > *continuescan = false; > > else if ((subkey->sk_flags & SK_BT_REQBKWD) && > > ScanDirectionIsBackward(dir)) > > *continuescan = false; > > return false; > > } > > Yes, exactly. Our row comparison operators don't match if there is any > null inside the row. But you can find these rows within the matching > range. > > > I noticed that you're not initializing so->firstPage correctly for the > > _bt_endpoint() path, which is used when the initial position of the > > scan is either the leftmost or rightmost page. That is, it's possible > > to reach _bt_readpage() without having reached the point in > > _bt_first() where you initialize so->firstPage to "true". > > Good catch, thank you! > > > It would probably make sense if the flag was initialized to "false" in > > the same way as most other scan state is already, somewhere in > > nbtree.c. Probably in btrescan(). > > Makes sense, initialisation is added. I've looked through the patch v8. I think it's good enough to be pushed if Peter has no objections. Regards, Pavel.
On Wed, Oct 4, 2023 at 12:59 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > I've looked through the patch v8. I think it's good enough to be > pushed if Peter has no objections. Thank you, Pavel. I'll push this if there are no objections. ------ Regards, Alexander Korotkov
On 04/10/2023 3:00 am, Alexander Korotkov wrote: > On Wed, Oct 4, 2023 at 12:59 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: >> I've looked through the patch v8. I think it's good enough to be >> pushed if Peter has no objections. > Thank you, Pavel. > I'll push this if there are no objections. > > ------ > Regards, > Alexander Korotkov Sorry, can you please also mention that original idea of this optimization belongs to Ilya Anfimov (it was discussed in @pgsql Telegram chat).
Hi, Konstantin! On Fri, 6 Oct 2023 at 22:44, Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > > On 04/10/2023 3:00 am, Alexander Korotkov wrote: > > On Wed, Oct 4, 2023 at 12:59 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > >> I've looked through the patch v8. I think it's good enough to be > >> pushed if Peter has no objections. > > Thank you, Pavel. > > I'll push this if there are no objections. > > > > ------ > > Regards, > > Alexander Korotkov > > > Sorry, can you please also mention that original idea of this > optimization belongs to Ilya Anfimov (it was discussed in @pgsql > Telegram chat). While it's no doubt correct to mention all authors of the patch, I looked through the thread and saw no mentions of Ilya's contributions/ideas before the patch became pushed. I'm not up to the current policy for processing these requests, but I suppose it's complicated to introduce back changes into the main branch that is already ahead of patch e0b1ee17dc3a38. Regards, Pavel
On Fri, Oct 6, 2023 at 9:59 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > On Fri, 6 Oct 2023 at 22:44, Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > > > > > On 04/10/2023 3:00 am, Alexander Korotkov wrote: > > > On Wed, Oct 4, 2023 at 12:59 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > >> I've looked through the patch v8. I think it's good enough to be > > >> pushed if Peter has no objections. > > > Thank you, Pavel. > > > I'll push this if there are no objections. > > > > > > ------ > > > Regards, > > > Alexander Korotkov > > > > > > Sorry, can you please also mention that original idea of this > > optimization belongs to Ilya Anfimov (it was discussed in @pgsql > > Telegram chat). > > While it's no doubt correct to mention all authors of the patch, I > looked through the thread and saw no mentions of Ilya's > contributions/ideas before the patch became pushed. I'm not up to the > current policy for processing these requests, but I suppose it's > complicated to introduce back changes into the main branch that is > already ahead of patch e0b1ee17dc3a38. Yep, that happened before. We don't do a force push to override commit messages and credit missing contributors. I waited more than 48 hours before pushing the final version of the patch, and that was the time to propose changes like this. Now, I think all we can do is credit Ilya on mailing lists. I believe we already did :) ------ Regards, Alexander Korotkov