Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
От | Peter Geoghegan |
---|---|
Тема | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
Дата | |
Msg-id | CAH2-Wz=BuxYEHxpNH0tPvo=+G1WtE1PamRoVU1dEVow1Vy9Y7A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan (Peter Geoghegan <pg@bowt.ie>) |
Список | pgsql-hackers |
On Mon, Dec 4, 2023 at 7:25 PM Peter Geoghegan <pg@bowt.ie> wrote: > 2. The optimization that has _bt_checkkeys skip non-required scan keys > that are *only* required in the direction *opposite* the current scan > direction -- this can work even without any precheck from > _bt_readpage. > > Note that this second optimization relies on various behaviors in > _bt_first() that make it impossible for _bt_checkkeys() to ever see a > tuple that could fail to satisfy such a scan key -- we must always > have passed over non-matching tuples, thanks to _bt_first(). That > prevents my patch with a problem: the logic for determining whether or > not we need a new primitive index scan only promises to never require > the scan to grovel through many leaf pages that _bt_first() could and > should just skip over instead. This new logic makes no promises about > skipping over small numbers of tuples. So it's possible that > _bt_checkkeys() will see a handful of tuples "after the end of the > _bt_first-wise primitive index scan", but "before the _bt_first-wise > start of the next would-be primitive index scan". BTW, I have my doubts about this actually being correct without the patch. The following comment block appears above _bt_preprocess_keys: * Note that one reason we need direction-sensitive required-key flags is * precisely that we may not be able to eliminate redundant keys. Suppose * we have "x > 4::int AND x > 10::bigint", and we are unable to determine * which key is more restrictive for lack of a suitable cross-type operator. * _bt_first will arbitrarily pick one of the keys to do the initial * positioning with. If it picks x > 4, then the x > 10 condition will fail * until we reach index entries > 10; but we can't stop the scan just because * x > 10 is failing. On the other hand, if we are scanning backwards, then * failure of either key is indeed enough to stop the scan. (In general, when * inequality keys are present, the initial-positioning code only promises to * position before the first possible match, not exactly at the first match, * for a forward scan; or after the last match for a backward scan.) As I understand it, this might still be okay, because the optimization in question from Alexander's commit e0b1ee17 (what I've called optimization 2) is careful about NULLs, which were the one case that definitely had problems. Note that IS NOT NULL works kind of like WHERE foo < NULL here (see old bug fix commit 882368e8, "Fix btree stop-at-nulls logic properly", for more context on this NULLs behavior). In any case, my patch isn't compatible with "optimization 2" (as in my tests break in a rather obvious way) due to a behavior that these old comments claim is normal within any scan (or perhaps normal in any scan with scan keys that couldn't be deemed redundant due to a lack of cross-type support in the opfamily). Something has to be wrong here -- could just be the comment, I suppose. But I find it easy to believe that Alexander's commit e0b1ee17 might not have been properly tested with opfamilies that lack a suitable cross-type operator. That's a pretty niche thing. (My patch doesn't need that niche thing to be present to easily break when combined with "optimization 2", which could hint at an existing and far more subtle problem.) -- Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Ashutosh BapatДата:
Сообщение: Re: table inheritance versus column compression and storage settings