Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Дата
Msg-id CAH2-Wz=BuxYEHxpNH0tPvo=+G1WtE1PamRoVU1dEVow1Vy9Y7A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Mon, Dec 4, 2023 at 7:25 PM Peter Geoghegan <pg@bowt.ie> wrote:
> 2. The optimization that has _bt_checkkeys skip non-required scan keys
> that are *only* required in the direction *opposite* the current scan
> direction -- this can work even without any precheck from
> _bt_readpage.
>
> Note that this second optimization relies on various behaviors in
> _bt_first() that make it impossible for _bt_checkkeys() to ever see a
> tuple that could fail to satisfy such a scan key -- we must always
> have passed over non-matching tuples, thanks to _bt_first(). That
> prevents my patch with a problem: the logic for determining whether or
> not we need a new primitive index scan only promises to never require
> the scan to grovel through many leaf pages that _bt_first() could and
> should just skip over instead. This new logic makes no promises about
> skipping over small numbers of tuples. So it's possible that
> _bt_checkkeys() will see a handful of tuples "after the end of the
> _bt_first-wise primitive index scan", but "before the _bt_first-wise
> start of the next would-be primitive index scan".

BTW, I have my doubts about this actually being correct without the
patch. The following comment block appears above _bt_preprocess_keys:

 * Note that one reason we need direction-sensitive required-key flags is
 * precisely that we may not be able to eliminate redundant keys.  Suppose
 * we have "x > 4::int AND x > 10::bigint", and we are unable to determine
 * which key is more restrictive for lack of a suitable cross-type operator.
 * _bt_first will arbitrarily pick one of the keys to do the initial
 * positioning with.  If it picks x > 4, then the x > 10 condition will fail
 * until we reach index entries > 10; but we can't stop the scan just because
 * x > 10 is failing.  On the other hand, if we are scanning backwards, then
 * failure of either key is indeed enough to stop the scan.  (In general, when
 * inequality keys are present, the initial-positioning code only promises to
 * position before the first possible match, not exactly at the first match,
 * for a forward scan; or after the last match for a backward scan.)

As I understand it, this might still be okay, because the optimization
in question from Alexander's commit e0b1ee17 (what I've called
optimization 2) is careful about NULLs, which were the one case that
definitely had problems. Note that IS NOT NULL works kind of like
WHERE foo < NULL here (see old bug fix commit 882368e8, "Fix btree
stop-at-nulls logic properly", for more context on this NULLs
behavior).

In any case, my patch isn't compatible with "optimization 2" (as in my
tests break in a rather obvious way) due to a behavior that these old
comments claim is normal within any scan (or perhaps normal in any
scan with scan keys that couldn't be deemed redundant due to a lack of
cross-type support in the opfamily).

Something has to be wrong here -- could just be the comment, I
suppose. But I find it easy to believe that Alexander's commit
e0b1ee17 might not have been properly tested with opfamilies that lack
a suitable cross-type operator. That's a pretty niche thing. (My patch
doesn't need that niche thing to be present to easily break when
combined with "optimization 2", which could hint at an existing and
far more subtle problem.)

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: table inheritance versus column compression and storage settings
Следующее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby