Обсуждение: Index range search optimization

Поиск

Список

Период

Сортировка

Index range search optimization

От

Konstantin Knizhnik

Дата:

23 июня 2023 г., 10:35:50

Hi hackers.

_bt_readpage performs key check for each item on the page trying to locate upper boundary.
While comparison of simple integer keys are very fast, comparison of long strings can be quite expensive.
We can first make check for the largest key on the page and if it is not larger than upper boundary, then skip checks for all elements.

At this quite artificial example such optimization gives 3x time speed-up:

create table t(t text primary key);
insert into t values ('primary key-'||generate_series(1,10000000)::text);
select count(*) from t where t between 'primary key-1000000' and 'primary key-2000000';

At my notebook with large enough shared buffers and disabled concurrency the difference is 83 vs. 247 msec
For integer keys the difference is much smaller: 69 vs. 82 msec

Certainly I realized that this example is quite exotic: most of DBAs prefer integer keys and such large ranges are quite rare.
But still such large range queries are used.
And I have checked that the proposed patch doesn't cause slowdown of exact search.

Вложения

range-search.patch

Re: Index range search optimization

От

Alexander Korotkov

Дата:

14 сентября 2023 г., 13:22:50

Hi!

On Fri, Jun 23, 2023 at 10:36 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:

_bt_readpage performs key check for each item on the page trying to locate upper boundary.
While comparison of simple integer keys are very fast, comparison of long strings can be quite expensive.
We can first make check for the largest key on the page and if it is not larger than upper boundary, then skip checks for all elements.

At this quite artificial example such optimization gives 3x time speed-up:
create table t(t text primary key);
insert into t values ('primary key-'||generate_series(1,10000000)::text);
select count(*) from t where t between 'primary key-1000000' and 'primary key-2000000';
At my notebook with large enough shared buffers and disabled concurrency the difference is 83 vs. 247 msec
For integer keys the difference is much smaller: 69 vs. 82 msec

Certainly I realized that this example is quite exotic: most of DBAs prefer integer keys and such large ranges are quite rare.
But still such large range queries are used.
And I have checked that the proposed patch doesn't cause slowdown of exact search.

Neat optimization! But I wonder if we could do even better. The attached patch allows Postgres to skip scan keys required for directional scans (even when other keys are present in the scan). I'll soon post the testing results and a more polished version of this patch.

------
Regards,
Alexander Korotkov

Вложения

0001-Skip-checking-of-scan-keys-required-for-direction-v1.patch

Re: Index range search optimization

От

Peter Geoghegan

Дата:

19 сентября 2023 г., 01:48:14

On Thu, Sep 14, 2023 at 3:23 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> The attached patch allows Postgres to skip scan keys required for directional scans (even when other keys are present
inthe scan). I'll soon post the testing results and a more polished version of this patch.

This is very interesting to me, partly because it seems related to my
ongoing work on SAOP execution within nbtree.

My patch gives _bt_readpage and particularly _bt_checkkeys more
high-level context, which they use to intelligently control the scan.
That enables us to dynamically decide whether we should now perform
another descent of the index via another call to _bt_first, or if we
should prefer to continue on the leaf level for now. Maybe we will
match many distinct sets of array keys on the same leaf page, in the
same call to _bt_readpage. We don't want to miss out on such
opportunities, but we also want to quickly notice when we're on a page
where matching any more array keys is just hopeless.

There is a need to keep these two things in balance. We need to notice
the hopeless cases before wasting too many cycles on it. That creates
a practical need to do an early precheck of the high key (roughly the
same check that we do already). If the high key indicates that
continuing on this page is truly hopeless, then we should give up and
do another primitive index scan -- _bt_first will reposition us onto
the leaf page that we need to go to next, which is (hopefully) far
away from the leaf page we started on.

Your patch therefore has the potential to help my own patch. But, it
also has some potential to conflict with it, because my patch makes
the meaning of SK_BT_REQFWD and SK_BT_REQBKWD more complicated (though
only in cases where we have SK_SEARCHARRAY scan keys). I'm sure that
this can be managed sensibly, though.

Some feedback on your patch:

* I notice that you're not using the high key for this, even in a
forward scan -- you're using the last non-pivot tuple on the page. Why
is that? (I have some idea why, actually, but I'd like to hear your
thoughts first.)

* Separately, I don't think that it makes sense to use the same
requiredDirMatched value (which came from the last non-pivot tuple on
the page) when the special _bt_checkkeys call for the high key takes
place. I don't think that this will lead to wrong answers, but it's
weird, and is likely to defeat the existing optimization in some
important cases.

Due to the influence of suffix truncation, it's relatively likely that
the most significant column in the high key will be different to the
corresponding value from the last few non-pivot tuples on the page --
the high key tends to be "aligned with natural boundaries in the key
space", and so "gives us a good preview of the right sibling page". We
don't want to treat it the same as non-pivot tuples here, because it's
quite different, in ways that are subtle but still important.

* I would avoid using the terminology "preprocess scan keys" for this.
That exact terminology is already used to describe what
_bt_preprocess_keys() does.

That function is actually involved in Konstantin's patch, so that
could be very confusing. When we "preprocess" a scan key, it outputs a
processed scan key with markings such as the required markings that
you're using in the patch -- it's something that acts on/changes the
scan keys themselves. Whereas your patch is exploiting information
from already-processed scan keys, by applying it to the key space of a
page.

I suggest calling it "prechecking the page", or something like that. I
don't feel very strongly about what you call it, provided it isn't
confusing or ambiguous.

--
Peter Geoghegan

Re: Index range search optimization

От

Alexander Korotkov

Дата:

20 сентября 2023 г., 17:07:28

Hi, Peter!

Thank you for your interest in this patch.

On Tue, Sep 19, 2023 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Sep 14, 2023 at 3:23 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > The attached patch allows Postgres to skip scan keys required for directional scans (even when other keys are
presentin the scan).  I'll soon post the testing results and a more polished version of this patch. 
>
> This is very interesting to me, partly because it seems related to my
> ongoing work on SAOP execution within nbtree.
>
> My patch gives _bt_readpage and particularly _bt_checkkeys more
> high-level context, which they use to intelligently control the scan.
> That enables us to dynamically decide whether we should now perform
> another descent of the index via another call to _bt_first, or if we
> should prefer to continue on the leaf level for now. Maybe we will
> match many distinct sets of array keys on the same leaf page, in the
> same call to _bt_readpage. We don't want to miss out on such
> opportunities, but we also want to quickly notice when we're on a page
> where matching any more array keys is just hopeless.
>
> There is a need to keep these two things in balance. We need to notice
> the hopeless cases before wasting too many cycles on it. That creates
> a practical need to do an early precheck of the high key (roughly the
> same check that we do already). If the high key indicates that
> continuing on this page is truly hopeless, then we should give up and
> do another primitive index scan -- _bt_first will reposition us onto
> the leaf page that we need to go to next, which is (hopefully) far
> away from the leaf page we started on.

This is a pretty neat optimization indeed!

> Your patch therefore has the potential to help my own patch. But, it
> also has some potential to conflict with it, because my patch makes
> the meaning of SK_BT_REQFWD and SK_BT_REQBKWD more complicated (though
> only in cases where we have SK_SEARCHARRAY scan keys). I'm sure that
> this can be managed sensibly, though.

OK!  Let me know if you feel that I need to change something in this
patch to lower the potential conflict.

> Some feedback on your patch:
>
> * I notice that you're not using the high key for this, even in a
> forward scan -- you're using the last non-pivot tuple on the page. Why
> is that? (I have some idea why, actually, but I'd like to hear your
> thoughts first.)

I'm using the last non-pivot tuple on the page instead of hikey
because it's lower than hikey.  As you stated below, the most
significant column in the hikey is likely different from that of the
last non-pivot tuple.  So, it's more likely to use the optimization
with the last non-pivot tuple.

> * Separately, I don't think that it makes sense to use the same
> requiredDirMatched value (which came from the last non-pivot tuple on
> the page) when the special _bt_checkkeys call for the high key takes
> place. I don't think that this will lead to wrong answers, but it's
> weird, and is likely to defeat the existing optimization in some
> important cases.
>
> Due to the influence of suffix truncation, it's relatively likely that
> the most significant column in the high key will be different to the
> corresponding value from the last few non-pivot tuples on the page --
> the high key tends to be "aligned with natural boundaries in the key
> space", and so "gives us a good preview of the right sibling page". We
> don't want to treat it the same as non-pivot tuples here, because it's
> quite different, in ways that are subtle but still important.

This definitely makes sense. I've removed the usage of
requiredDirMatched from this _bt_checkkeys() call.

> * I would avoid using the terminology "preprocess scan keys" for this.
> That exact terminology is already used to describe what
> _bt_preprocess_keys() does.
>
> That function is actually involved in Konstantin's patch, so that
> could be very confusing. When we "preprocess" a scan key, it outputs a
> processed scan key with markings such as the required markings that
> you're using in the patch -- it's something that acts on/changes the
> scan keys themselves. Whereas your patch is exploiting information
> from already-processed scan keys, by applying it to the key space of a
> page.
>
> I suggest calling it "prechecking the page", or something like that. I
> don't feel very strongly about what you call it, provided it isn't
> confusing or ambiguous.


 This also makes sense. I've rephrased the comment.

------
Regards,
Alexander Korotkov

Вложения

0001-Skip-checking-of-scan-keys-required-for-direction-v2.patch

Re: Index range search optimization

От

Alexander Korotkov

Дата:

21 сентября 2023 г., 12:14:17

On Wed, Sep 20, 2023 at 5:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Tue, Sep 19, 2023 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
>  This also makes sense. I've rephrased the comment.

The revised patch is attached.  It contains better comments and the
commit message.  Peter, could you please check if you're OK with this?

------
Regards,
Alexander Korotkov

Вложения

0001-Skip-checking-of-scan-keys-required-for-direction-v3.patch

Re: Index range search optimization

От

Pavel Borisov

Дата:

21 сентября 2023 г., 15:10:58

On Thu, 21 Sept 2023 at 15:17, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Wed, Sep 20, 2023 at 5:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Tue, Sep 19, 2023 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
> >  This also makes sense. I've rephrased the comment.
>
> The revised patch is attached.  It contains better comments and the
> commit message.  Peter, could you please check if you're OK with this?
Hi, Alexander!

I looked at the patch code and I agree with this optimization.
Implementation also looks good to me except change :
+ if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) &&
+ !(key->sk_flags & SK_ROW_HEADER))
+ requiredDir = true;
...
- if ((key->sk_flags & SK_BT_REQFWD) &&
- ScanDirectionIsForward(dir))
- *continuescan = false;
- else if ((key->sk_flags & SK_BT_REQBKWD) &&
- ScanDirectionIsBackward(dir))
+ if (requiredDir)
  *continuescan = false;

looks like changing behavior in the case when key->sk_flags &
SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) &&
(!requiredDirMatched)
Originally it doesn't set *continuescan = false; and with the patch it will set.

This may be relevant for the first page when requiredDirMatched is
intentionally skipped to be set and for call
_bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false);

Maybe I missed something and this can not appear for some reason?

Also naming of requiredDirMatched and requiredDir seems semantically
hard to understand the meaning without looking at the patch commit
message. But I don't have better proposals yet, so maybe it's
acceptable.

Kind regards,
Pavel Borisov
Supabase.

Re: Index range search optimization

От

Peter Geoghegan

Дата:

21 сентября 2023 г., 23:48:09

On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I looked at the patch code and I agree with this optimization.
> Implementation also looks good to me except change :
> + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) &&
> + !(key->sk_flags & SK_ROW_HEADER))
> + requiredDir = true;
> ...
> - if ((key->sk_flags & SK_BT_REQFWD) &&
> - ScanDirectionIsForward(dir))
> - *continuescan = false;
> - else if ((key->sk_flags & SK_BT_REQBKWD) &&
> - ScanDirectionIsBackward(dir))
> + if (requiredDir)
>   *continuescan = false;
>
> looks like changing behavior in the case when key->sk_flags &
> SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) &&
> (!requiredDirMatched)
> Originally it doesn't set *continuescan = false; and with the patch it will set.

I agree that this is a problem. Inequality strategy scan keys are used
when the initial positioning strategy used by _bt_first (for its
_bt_search call) is based on an operator other than the "=" operator
for the opclass. These scan keys are required in one direction only
(Konstantin's original patch just focussed on these cases, actually).
Obviously, that difference matters. I don't think that this patch
should do anything that even looks like it might be revising the
formal definition of "required in the current scan direction".

Why is SK_ROW_HEADER treated as a special case by the patch? Could it
be related to the issues with required-ness and scan direction? Note
that we never use BTEqualStrategyNumber for SK_ROW_HEADER scan key row
comparisons, so they're only ever required for one scan direction.
(Equality-type row constructor syntax can of course be used without
preventing the system from using an index scan, but the nbtree code
will not see that case as a row comparison in the first place. This is
due to preprocessing by the planner -- nbtree just sees conventional
scan keys with multiple simple equality scan keys with = row
comparisons.)

Also, what about NULLs? While "key IS NULL" is classified as an
equality check (see _bt_preprocess_keys comments), the same isn't true
with "key IS NOT NULL". The latter case usually has scan key flags
"SK_ISNULL | SK_SEARCHNOTNULL | SK_BT_REQFWD" -- there is no
SK_BT_REQBKWD here.

> This may be relevant for the first page when requiredDirMatched is
> intentionally skipped to be set and for call
> _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false);

Also, requiredDirMatched isn't initialized by _bt_readpage() when
"so->firstPage". Shouldn't it be initialized to false?

Also, don't we need to take more care with a fully empty page? The "if
(!so->firstPage) ... " block should be gated using a condition such as
"if (!so->firstPage && minoff < maxoff)". (Adding a "minoff <= maxoff"
test would also work, but then the optimization will get applied on
pages with only one non-pivot tuple. That would be harmless, but a
waste of cycles.)

> Also naming of requiredDirMatched and requiredDir seems semantically
> hard to understand the meaning without looking at the patch commit
> message. But I don't have better proposals yet, so maybe it's
> acceptable.

I agree. How about "requiredMatchedByPrecheck" instead of
"requiredDirMatched", and "required" instead of "requiredDir"?

It would be nice if this patch worked in a way that could be verified
by an assertion. Under this scheme, the optimization would only really
be used in release builds (builds without assertions enabled, really).
We'd only verify that the optimized case agreed with the slow path in
assert-enabled builds. It might also make sense to always "apply the
optimization" on assert-enabled builds, even for the first page seen
by _bt_readpage by any _bt_first-wise scan. Maybe this sort of
approach is impractical here for some reason, but I don't see why it
should be.

Obviously, the optimization should lower the amount of work in some
calls to _bt_checkkeys, without ever changing the answer _bt_checkkeys
gives. Ideally, it should be done in a way that makes that very
obvious. There are some very subtle interactions between _bt_checkkeys
and other, distant code -- which makes me feel paranoid. Notably, with
required equality strategy scan keys, we're crucially dependent on
_bt_first using an equality strategy for its initial positioning call
to _bt_search. This is described by comments in both _bt_checkkeys and
in _bt_first.

Note, in particular, that it is essential that the initial offnum
passed to _bt_readpage doesn't allow a call to _bt_checkkeys to take
place that could cause it to become confused by a required equality
strategy scan key, leading to _bt_checkkeys terminating the whole scan
"early" -- causing wrong answers. For a query "WHERE foo = 5" (and a
forward scan), we had better not pass _bt_readpage an offset number
for a tuple with "foo" value 4. If that is ever allowed then
_bt_checkkeys will terminate the scan immediately, leading to wrong
answers. All because _bt_checkkeys can't tell if 4 comes before 5 or
comes after 5 -- it only has an "=" operator to work with, so it can't
actually make this distinction, so it likes to assume that anything !=
5 must come after 5 (or before 5 during a backwards scan).

I added a very similar _bt_compare()-based assertion in
_bt_check_unique(), which went on to catch a very subtle bug in the
Postgres 12 nbtree work -- the bug fixed by commit 74eb2176bf. So I
have put this particular idea about asserting agreement between a fast
path and a slow comparison path into practice already.

--
Peter Geoghegan

Re: Index range search optimization

От

Pavel Borisov

Дата:

22 сентября 2023 г., 17:01:18

On Fri, 22 Sept 2023 at 00:48, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > I looked at the patch code and I agree with this optimization.
> > Implementation also looks good to me except change :
> > + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) &&
> > + !(key->sk_flags & SK_ROW_HEADER))
> > + requiredDir = true;
> > ...
> > - if ((key->sk_flags & SK_BT_REQFWD) &&
> > - ScanDirectionIsForward(dir))
> > - *continuescan = false;
> > - else if ((key->sk_flags & SK_BT_REQBKWD) &&
> > - ScanDirectionIsBackward(dir))
> > + if (requiredDir)
> >   *continuescan = false;
> >
> > looks like changing behavior in the case when key->sk_flags &
> > SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) &&
> > (!requiredDirMatched)
> > Originally it doesn't set *continuescan = false; and with the patch it will set.
>
> I agree that this is a problem. Inequality strategy scan keys are used
> when the initial positioning strategy used by _bt_first (for its
> _bt_search call) is based on an operator other than the "=" operator
> for the opclass. These scan keys are required in one direction only
> (Konstantin's original patch just focussed on these cases, actually).
> Obviously, that difference matters. I don't think that this patch
> should do anything that even looks like it might be revising the
> formal definition of "required in the current scan direction".
I think it's the simplification that changed code behavior - just an
overlook and this could be fixed easily.

> Also, requiredDirMatched isn't initialized by _bt_readpage() when
> "so->firstPage". Shouldn't it be initialized to false?
True.

> > Also naming of requiredDirMatched and requiredDir seems semantically
> > hard to understand the meaning without looking at the patch commit
> > message. But I don't have better proposals yet, so maybe it's
> > acceptable.
>
> I agree. How about "requiredMatchedByPrecheck" instead of
> "requiredDirMatched", and "required" instead of "requiredDir"?
For me, the main semantic meaning is omitted and even more unclear,
i.e. what exactly required and matched. I'd  suppose scanDirRequired,
scanDirMatched, but feel it's not ideal either. Or maybe trySkipRange,
canSkipRange etc.

Regards,
Pavel Borisov,
Supabase.

Re: Index range search optimization

От

Alexander Korotkov

Дата:

22 сентября 2023 г., 17:24:40

Hi Peter,
Hi Pavel,

The v4 of the patch is attached.

On Thu, Sep 21, 2023 at 11:48 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > I looked at the patch code and I agree with this optimization.
> > Implementation also looks good to me except change :
> > + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) &&
> > + !(key->sk_flags & SK_ROW_HEADER))
> > + requiredDir = true;
> > ...
> > - if ((key->sk_flags & SK_BT_REQFWD) &&
> > - ScanDirectionIsForward(dir))
> > - *continuescan = false;
> > - else if ((key->sk_flags & SK_BT_REQBKWD) &&
> > - ScanDirectionIsBackward(dir))
> > + if (requiredDir)
> >   *continuescan = false;
> >
> > looks like changing behavior in the case when key->sk_flags &
> > SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) &&
> > (!requiredDirMatched)
> > Originally it doesn't set *continuescan = false; and with the patch it will set.
>
> I agree that this is a problem. Inequality strategy scan keys are used
> when the initial positioning strategy used by _bt_first (for its
> _bt_search call) is based on an operator other than the "=" operator
> for the opclass. These scan keys are required in one direction only
> (Konstantin's original patch just focussed on these cases, actually).
> Obviously, that difference matters. I don't think that this patch
> should do anything that even looks like it might be revising the
> formal definition of "required in the current scan direction".

Sorry, that was messed up from various attempts to write the patch.
Actually, I end up with two boolean variables indicating whether the
current key is required for the same direction or opposite direction
scan.  I believe that the key required for the opposite direction scan
should be already satisfied by _bt_first() except for NULLs case.
I've implemented a skip of calling the key function for this case
(with assert that result is the same).

> Why is SK_ROW_HEADER treated as a special case by the patch? Could it
> be related to the issues with required-ness and scan direction? Note
> that we never use BTEqualStrategyNumber for SK_ROW_HEADER scan key row
> comparisons, so they're only ever required for one scan direction.
> (Equality-type row constructor syntax can of course be used without
> preventing the system from using an index scan, but the nbtree code
> will not see that case as a row comparison in the first place. This is
> due to preprocessing by the planner -- nbtree just sees conventional
> scan keys with multiple simple equality scan keys with = row
> comparisons.)

The thing is that NULLs could appear in the middle of matching values.

# WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a'))
SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b);
 a |  b   | ?column?
---+------+----------
 a | b    | t
 a | NULL | NULL
 b | a    | t
(3 rows)

So we can't just skip the row comparison operator, because we can meet
NULL at any place.

> > This may be relevant for the first page when requiredDirMatched is
> > intentionally skipped to be set and for call
> > _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false);
>
> Also, requiredDirMatched isn't initialized by _bt_readpage() when
> "so->firstPage". Shouldn't it be initialized to false?
>
> Also, don't we need to take more care with a fully empty page? The "if
> (!so->firstPage) ... " block should be gated using a condition such as
> "if (!so->firstPage && minoff < maxoff)". (Adding a "minoff <= maxoff"
> test would also work, but then the optimization will get applied on
> pages with only one non-pivot tuple. That would be harmless, but a
> waste of cycles.)

This makes sense.  I've added (minoff < maxoff) to the condition.

> > Also naming of requiredDirMatched and requiredDir seems semantically
> > hard to understand the meaning without looking at the patch commit
> > message. But I don't have better proposals yet, so maybe it's
> > acceptable.
>
> I agree. How about "requiredMatchedByPrecheck" instead of
> "requiredDirMatched", and "required" instead of "requiredDir"?
>
> It would be nice if this patch worked in a way that could be verified
> by an assertion. Under this scheme, the optimization would only really
> be used in release builds (builds without assertions enabled, really).
> We'd only verify that the optimized case agreed with the slow path in
> assert-enabled builds. It might also make sense to always "apply the
> optimization" on assert-enabled builds, even for the first page seen
> by _bt_readpage by any _bt_first-wise scan. Maybe this sort of
> approach is impractical here for some reason, but I don't see why it
> should be.

Yes, this makes sense.  I've added an assert check that results are
the same as with requiredMatchedByPrecheck == false.

> Obviously, the optimization should lower the amount of work in some
> calls to _bt_checkkeys, without ever changing the answer _bt_checkkeys
> gives. Ideally, it should be done in a way that makes that very
> obvious. There are some very subtle interactions between _bt_checkkeys
> and other, distant code -- which makes me feel paranoid. Notably, with
> required equality strategy scan keys, we're crucially dependent on
> _bt_first using an equality strategy for its initial positioning call
> to _bt_search. This is described by comments in both _bt_checkkeys and
> in _bt_first.
>
> Note, in particular, that it is essential that the initial offnum
> passed to _bt_readpage doesn't allow a call to _bt_checkkeys to take
> place that could cause it to become confused by a required equality
> strategy scan key, leading to _bt_checkkeys terminating the whole scan
> "early" -- causing wrong answers. For a query "WHERE foo = 5" (and a
> forward scan), we had better not pass _bt_readpage an offset number
> for a tuple with "foo" value 4. If that is ever allowed then
> _bt_checkkeys will terminate the scan immediately, leading to wrong
> answers. All because _bt_checkkeys can't tell if 4 comes before 5 or
> comes after 5 -- it only has an "=" operator to work with, so it can't
> actually make this distinction, so it likes to assume that anything !=
> 5 must come after 5 (or before 5 during a backwards scan).
>
> I added a very similar _bt_compare()-based assertion in
> _bt_check_unique(), which went on to catch a very subtle bug in the
> Postgres 12 nbtree work -- the bug fixed by commit 74eb2176bf. So I
> have put this particular idea about asserting agreement between a fast
> path and a slow comparison path into practice already.

Good, thank you for the detailed clarification.

------
Regards,
Alexander Korotkov

Вложения

0001-Skip-checking-of-scan-keys-required-for-direction-v4.patch

Re: Index range search optimization

От

Pavel Borisov

Дата:

25 сентября 2023 г., 12:58:02

Hi, Alexander!

I found and fixed a couple of naming issues that came to v4 from
earlier patches.
Also, I added initialization of requiredMatchedByPrecheck in case of first page.

Please see patch v5.

One more doubt about naming. Calling function
_bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck)
as
(void) _bt_checkkeys(scan, itup, indnatts, dir,
&requiredMatchedByPrecheck, false);
looks little bit misleading because of coincidence of names of 5 and 6
arguments.

Вложения

0001-Skip-checking-of-scan-keys-required-for-direction-v4.patch

Re: Index range search optimization

От

Pavel Borisov

Дата:

25 сентября 2023 г., 13:11:53

Sorry, I've mistaken with attached version previously. Correct v5 attached.

On Mon, 25 Sept 2023 at 13:58, Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> Hi, Alexander!
>
> I found and fixed a couple of naming issues that came to v4 from
> earlier patches.
> Also, I added initialization of requiredMatchedByPrecheck in case of first page.
>
> Please see patch v5.
>
> One more doubt about naming. Calling function
> _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
> ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck)
> as
> (void) _bt_checkkeys(scan, itup, indnatts, dir,
> &requiredMatchedByPrecheck, false);
> looks little bit misleading because of coincidence of names of 5 and 6
> arguments.

Вложения

v5-0001-PATCH-Skip-checking-of-scan-keys-required-for-dir.patch

Re: Index range search optimization

От

Alexander Korotkov

Дата:

25 сентября 2023 г., 13:18:04

On Mon, Sep 25, 2023 at 12:58 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> One more doubt about naming. Calling function
> _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
> ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck)
> as
> (void) _bt_checkkeys(scan, itup, indnatts, dir,
> &requiredMatchedByPrecheck, false);
> looks little bit misleading because of coincidence of names of 5 and 6
> arguments.

I've added the comment clarifying this argument usage.

------
Regards,
Alexander Korotkov

Вложения

0001-Skip-checking-of-scan-keys-required-for-direction-v6.patch

Re: Index range search optimization

От

Alexander Korotkov

Дата:

27 сентября 2023 г., 19:41:31

On Mon, Sep 25, 2023 at 1:18 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Mon, Sep 25, 2023 at 12:58 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> One more doubt about naming. Calling function
> _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
> ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck)
> as
> (void) _bt_checkkeys(scan, itup, indnatts, dir,
> &requiredMatchedByPrecheck, false);
> looks little bit misleading because of coincidence of names of 5 and 6
> arguments.

I've added the comment clarifying this argument usage.

Fixed typo inficating => indicating as pointed by Pavel.

Peter, what do you think about the current shape of the patch?

------
Regards,

Alexander Korotkov

Вложения

0001-Skip-checking-of-scan-keys-required-for-direction-v7.patch

Re: Index range search optimization

От

Peter Geoghegan

Дата:

28 сентября 2023 г., 05:21:00

On Wed, Sep 27, 2023 at 9:41 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Fixed typo inficating => indicating as pointed by Pavel.
> Peter, what do you think about the current shape of the patch?

I'll try to get to this tomorrow. I'm rather busy with moving home at
the moment, unfortunately.


--
Peter Geoghegan

Re: Index range search optimization

От

Alexander Korotkov

Дата:

28 сентября 2023 г., 22:03:41

On Thu, Sep 28, 2023 at 5:21 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Wed, Sep 27, 2023 at 9:41 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Fixed typo inficating => indicating as pointed by Pavel.
> > Peter, what do you think about the current shape of the patch?
>
> I'll try to get to this tomorrow. I'm rather busy with moving home at
> the moment, unfortunately.

No problem, thank you!

------
Regards,
Alexander Korotkov

Re: Index range search optimization

От

Peter Geoghegan

Дата:

29 сентября 2023 г., 04:57:07

On Fri, Sep 22, 2023 at 7:24 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> The thing is that NULLs could appear in the middle of matching values.
>
> # WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a'))
> SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b);
>  a |  b   | ?column?
> ---+------+----------
>  a | b    | t
>  a | NULL | NULL
>  b | a    | t
> (3 rows)
>
> So we can't just skip the row comparison operator, because we can meet
> NULL at any place.

But why would SK_ROW_HEADER be any different? Is it related to this
existing case inside _bt_check_rowcompare()?:

        if (subkey->sk_flags & SK_ISNULL)
        {
            /*
             * Unlike the simple-scankey case, this isn't a disallowed case.
             * But it can never match.  If all the earlier row comparison
             * columns are required for the scan direction, we can stop the
             * scan, because there can't be another tuple that will succeed.
             */
            if (subkey != (ScanKey) DatumGetPointer(skey->sk_argument))
                subkey--;
            if ((subkey->sk_flags & SK_BT_REQFWD) &&
                ScanDirectionIsForward(dir))
                *continuescan = false;
            else if ((subkey->sk_flags & SK_BT_REQBKWD) &&
                     ScanDirectionIsBackward(dir))
                *continuescan = false;
            return false;
        }

I noticed that you're not initializing so->firstPage correctly for the
_bt_endpoint() path, which is used when the initial position of the
scan is either the leftmost or rightmost page. That is, it's possible
to reach _bt_readpage() without having reached the point in
_bt_first() where you initialize so->firstPage to "true".

It would probably make sense if the flag was initialized to "false" in
the same way as most other scan state is already, somewhere in
nbtree.c. Probably in btrescan().

--
Peter Geoghegan

Re: Index range search optimization

От

Alexander Korotkov

Дата:

29 сентября 2023 г., 09:35:24

Hi, Peter.

On Fri, Sep 29, 2023 at 4:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Fri, Sep 22, 2023 at 7:24 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > The thing is that NULLs could appear in the middle of matching values.
> >
> > # WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a'))
> > SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b);
> >  a |  b   | ?column?
> > ---+------+----------
> >  a | b    | t
> >  a | NULL | NULL
> >  b | a    | t
> > (3 rows)
> >
> > So we can't just skip the row comparison operator, because we can meet
> > NULL at any place.
>
> But why would SK_ROW_HEADER be any different? Is it related to this
> existing case inside _bt_check_rowcompare()?:
>
>         if (subkey->sk_flags & SK_ISNULL)
>         {
>             /*
>              * Unlike the simple-scankey case, this isn't a disallowed case.
>              * But it can never match.  If all the earlier row comparison
>              * columns are required for the scan direction, we can stop the
>              * scan, because there can't be another tuple that will succeed.
>              */
>             if (subkey != (ScanKey) DatumGetPointer(skey->sk_argument))
>                 subkey--;
>             if ((subkey->sk_flags & SK_BT_REQFWD) &&
>                 ScanDirectionIsForward(dir))
>                 *continuescan = false;
>             else if ((subkey->sk_flags & SK_BT_REQBKWD) &&
>                      ScanDirectionIsBackward(dir))
>                 *continuescan = false;
>             return false;
>         }

Yes, exactly. Our row comparison operators don't match if there is any
null inside the row.  But you can find these rows within the matching
range.

> I noticed that you're not initializing so->firstPage correctly for the
> _bt_endpoint() path, which is used when the initial position of the
> scan is either the leftmost or rightmost page. That is, it's possible
> to reach _bt_readpage() without having reached the point in
> _bt_first() where you initialize so->firstPage to "true".

Good catch, thank you!

> It would probably make sense if the flag was initialized to "false" in
> the same way as most other scan state is already, somewhere in
> nbtree.c. Probably in btrescan().

Makes sense, initialisation is added.

------
Regards,
Alexander Korotkov

Вложения

0001-Skip-checking-of-scan-keys-required-for-direction-v8.patch

Re: Index range search optimization

От

Pavel Borisov

Дата:

04 октября 2023 г., 00:58:49

Hi!

On Fri, 29 Sept 2023 at 10:35, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi, Peter.
>
> On Fri, Sep 29, 2023 at 4:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > On Fri, Sep 22, 2023 at 7:24 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > The thing is that NULLs could appear in the middle of matching values.
> > >
> > > # WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a'))
> > > SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b);
> > >  a |  b   | ?column?
> > > ---+------+----------
> > >  a | b    | t
> > >  a | NULL | NULL
> > >  b | a    | t
> > > (3 rows)
> > >
> > > So we can't just skip the row comparison operator, because we can meet
> > > NULL at any place.
> >
> > But why would SK_ROW_HEADER be any different? Is it related to this
> > existing case inside _bt_check_rowcompare()?:
> >
> >         if (subkey->sk_flags & SK_ISNULL)
> >         {
> >             /*
> >              * Unlike the simple-scankey case, this isn't a disallowed case.
> >              * But it can never match.  If all the earlier row comparison
> >              * columns are required for the scan direction, we can stop the
> >              * scan, because there can't be another tuple that will succeed.
> >              */
> >             if (subkey != (ScanKey) DatumGetPointer(skey->sk_argument))
> >                 subkey--;
> >             if ((subkey->sk_flags & SK_BT_REQFWD) &&
> >                 ScanDirectionIsForward(dir))
> >                 *continuescan = false;
> >             else if ((subkey->sk_flags & SK_BT_REQBKWD) &&
> >                      ScanDirectionIsBackward(dir))
> >                 *continuescan = false;
> >             return false;
> >         }
>
> Yes, exactly. Our row comparison operators don't match if there is any
> null inside the row.  But you can find these rows within the matching
> range.
>
> > I noticed that you're not initializing so->firstPage correctly for the
> > _bt_endpoint() path, which is used when the initial position of the
> > scan is either the leftmost or rightmost page. That is, it's possible
> > to reach _bt_readpage() without having reached the point in
> > _bt_first() where you initialize so->firstPage to "true".
>
> Good catch, thank you!
>
> > It would probably make sense if the flag was initialized to "false" in
> > the same way as most other scan state is already, somewhere in
> > nbtree.c. Probably in btrescan().
>
> Makes sense, initialisation is added.
I've looked through the patch v8. I think it's good enough to be
pushed if Peter has no objections.

Regards,
Pavel.

Re: Index range search optimization

От

Alexander Korotkov

Дата:

04 октября 2023 г., 03:00:09

On Wed, Oct 4, 2023 at 12:59 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I've looked through the patch v8. I think it's good enough to be
> pushed if Peter has no objections.

Thank you, Pavel.
I'll push this if there are no objections.

------
Regards,
Alexander Korotkov

Re: Index range search optimization

От

Konstantin Knizhnik

Дата:

06 октября 2023 г., 21:44:05

On 04/10/2023 3:00 am, Alexander Korotkov wrote:
> On Wed, Oct 4, 2023 at 12:59 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> I've looked through the patch v8. I think it's good enough to be
>> pushed if Peter has no objections.
> Thank you, Pavel.
> I'll push this if there are no objections.
>
> ------
> Regards,
> Alexander Korotkov


Sorry, can you please also mention that original idea of this 
optimization belongs to Ilya Anfimov (it was discussed in @pgsql 
Telegram chat).

Re: Index range search optimization

От

Pavel Borisov

Дата:

06 октября 2023 г., 21:59:16

Hi, Konstantin!

On Fri, 6 Oct 2023 at 22:44, Konstantin Knizhnik <knizhnik@garret.ru> wrote:
>
>
> On 04/10/2023 3:00 am, Alexander Korotkov wrote:
> > On Wed, Oct 4, 2023 at 12:59 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> >> I've looked through the patch v8. I think it's good enough to be
> >> pushed if Peter has no objections.
> > Thank you, Pavel.
> > I'll push this if there are no objections.
> >
> > ------
> > Regards,
> > Alexander Korotkov
>
>
> Sorry, can you please also mention that original idea of this
> optimization belongs to Ilya Anfimov (it was discussed in @pgsql
> Telegram chat).

While it's no doubt correct to mention all authors of the patch, I
looked through the thread and saw no mentions of Ilya's
contributions/ideas before the patch became pushed. I'm not up to the
current policy for processing these requests, but I suppose it's
complicated to introduce back changes into the main branch that is
already ahead of patch e0b1ee17dc3a38.

Regards,
Pavel

Re: Index range search optimization

От

Alexander Korotkov

Дата:

07 октября 2023 г., 20:42:51

On Fri, Oct 6, 2023 at 9:59 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Fri, 6 Oct 2023 at 22:44, Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> >
> >
> > On 04/10/2023 3:00 am, Alexander Korotkov wrote:
> > > On Wed, Oct 4, 2023 at 12:59 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > >> I've looked through the patch v8. I think it's good enough to be
> > >> pushed if Peter has no objections.
> > > Thank you, Pavel.
> > > I'll push this if there are no objections.
> > >
> > > ------
> > > Regards,
> > > Alexander Korotkov
> >
> >
> > Sorry, can you please also mention that original idea of this
> > optimization belongs to Ilya Anfimov (it was discussed in @pgsql
> > Telegram chat).
>
> While it's no doubt correct to mention all authors of the patch, I
> looked through the thread and saw no mentions of Ilya's
> contributions/ideas before the patch became pushed. I'm not up to the
> current policy for processing these requests, but I suppose it's
> complicated to introduce back changes into the main branch that is
> already ahead of patch e0b1ee17dc3a38.

Yep, that happened before.  We don't do a force push to override
commit messages and credit missing contributors.  I waited more than
48 hours before pushing the final version of the patch, and that was
the time to propose changes like this.  Now, I think all we can do is
credit Ilya on mailing lists.  I believe we already did :)

------
Regards,
Alexander Korotkov