Re: Index Skip Scan
От | Peter Geoghegan |
---|---|
Тема | Re: Index Skip Scan |
Дата | |
Msg-id | CAH2-Wzk6hvXvZ+1q4JBRunZkxmJwdXxnNsWuKJBp+F+mJuZYZw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Index Skip Scan (Jesper Pedersen <jesper.pedersen@redhat.com>) |
Ответы |
Re: Index Skip Scan
(Peter Geoghegan <pg@bowt.ie>)
Re: Index Skip Scan (Dmitry Dolgov <9erthalion6@gmail.com>) |
Список | pgsql-hackers |
On Mon, Jan 20, 2020 at 11:01 AM Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > > - nbtsearch.c _bt_skip line 1440 > > if (BTScanPosIsValid(so->currPos) && > > _bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir)) > > > > Is it allowed to look at the high key / low key of the page without have a read lock on it? > > > > In case of a split the page will still contain a high key and a low key, > so this should be ok. This is definitely not okay. > > - nbtsearch.c in general > > Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to neverrelease the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1]I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario- so it may be fine. Try making _bt_findinsertloc() call _bt_vacuum_one_page() whenever the page is P_HAS_GARBAGE(), regardless of whether or not the page is about to split. That will still be correct, while having a much better chance of breaking the patch during stress-testing. Relying on a buffer pin to prevent the B-Tree structure itself from changing in any important way seems likely to be broken already. Even if it isn't, it sounds fragile. A leaf page doesn't really have anything called a low key. It usually has a current first "data item"/non-pivot tuple, which is an inherently unstable thing. Also, it has a very loose relationship with the high key of the left sibling page, which the the closest thing to a low key that exists (often they'll have almost the same key values, but that is not guaranteed at all). While I haven't studied the patch, the logic within _bt_scankey_within_page() seems fishy to me for that reason. > There is a BT_READ lock in place when finding the correct leaf page, or > searching within the leaf page itself. _bt_vacuum_one_page deletes only > LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do > you have some feedback for this ? It sounds like the design of the patch relies on doing something other than stopping a scan "between" pages, in the sense that is outlined in the commit message of commit 09cb5c0e. If so, then that's a serious flaw in its design. -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: