Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
От | Peter Geoghegan |
---|---|
Тема | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons |
Дата | |
Msg-id | CAH2-WzkOmUduME31QnuTFpimejuQoiZ-HOf0pOWeFZNhTMctvA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
|
Список | pgsql-hackers |
On Sun, Nov 4, 2018 at 10:58 AM Peter Geoghegan <pg@bowt.ie> wrote: > Just the lack of pg_upgrade support. Attached is v7 of the patch series. Changes: * Pre-pg_upgrade indexes (indexes of an earlier BTREE_VERSION) are now supported. Using pg_upgrade will be seamless to users. "Getting tired" returns, for the benefit of old indexes that regularly have lots of duplicates inserted. Notably, the new/proposed version of btree (BTREE_VERSION 4) cannot be upgraded on-the-fly -- we're changing more than the contents of the metapage, so that won't work. Version 2 -> version 3 upgrades can still take place dynamically/on-the-fly. It you want to upgrade to version 4, you'll need to REINDEX. The performance of the patch with pg_upgrade'd indexes has been validated. There doesn't seem to be any regressions. amcheck checks both the old invariants, and the new/stricter/L&Y invariants. Which set are checked depends on the btree version of the index undergoing verification. * ASC heap TID order is now used -- not DESC order, as before. This fixed all performance regressions that I'm aware of, and seems quite a lot more elegant overall. I believe that the patch series is now an unambiguous, across the board win for performance. I could see about a 1% increase in transaction throughput with my own TPC-C tests, while the big drop in the size of indexes was preserved. pgbench testing also showed as much as a 3.5% increase in transaction throughput in some cases with non-uniform distributions. Thanks for the suggestion, Heikki! Unfortunately, and as predicted, this change created a new problem that I need to fix directly: it makes certain diagnostic messages that accidentally depend on a certain pg_depend scan order say something different, and less useful (though still technically correct). I'll tackle that problem over on the dedicated thread I started [1]. (For now, I include a separate patch to paper over questionable regression test changes in a controlled way: v7-0005-Temporarily-paper-over-problematic-regress-output.patch.) * New optimization that has index scans avoid visiting the next page by checking the high key -- this is broken out into its own commit (v7-0002-Weigh-suffix-truncation-when-choosing-a-split-poi.patch). This is related to an optimization that has been around for years -- we're now using the high key, rather than using a normal (non-pivot) index tuple. High keys are much more likely to indicate that the scan doesn't need to visit the next page with the earlier patches in the patch series applied, since the new logic for choosing a split point favors a high key with earlier differences. It's pretty easy to take advantage of that. With a composite index, or a secondary index, it's particularly likely that we can avoid visiting the next leaf page. In other words, now that we're being smarter about future locality of access during page splits, we should take full advantage during index scans. The v7-0001-Make-nbtree-indexes-have-unique-keys-in-tuples.patch commit uses a _bt_lowest_scantid() sentinel value to avoid unnecessarily visiting a page to the left of the page we actually ought to go to directly during a descent of a B-Tree -- that optimization was around in all earlier versions of the patch series. It seems natural to also have this new-to-v7 optimization. It avoids unnecessarily going right once we reach the leaf level, so it "does the same thing on the right side" -- the two optimizations mirror each other. If you don't get what I mean by that, then imagine a secondary index where each value appears a few hundred times. Literally every simple lookup query will either benefit from the first optimization on the way down the tree, or from the second optimization towards the end of the scan. (The page split logic ought to pack large groups of duplicates together, ideally confining them to one leaf page.) Andrey: the BTScanInsert struct still has the restorebinsrch stuff (mutable binary search optimization state) in v7. It seemed to make sense to keep it there, because I think that we'll be able to add similar optimizations in the future, that use similar mutable state. See my remarks on "dynamic prefix truncation" [2]. I think that that could be very helpful with skip scans, for example, so we'll probably end up adding it before too long. I hope you don't feel too strongly about it. [1] https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com [2] https://postgr.es/m/CAH2-WzkpKeZJrXvR_p7VSY1b-s85E3gHyTbZQzR0BkJ5LrWF_A@mail.gmail.com -- Peter Geoghegan
Вложения
- v7-0005-Temporarily-paper-over-problematic-regress-output.patch
- v7-0006-DEBUG-Add-pageinspect-instrumentation.patch
- v7-0004-Add-high-key-continuescan-optimization.patch
- v7-0002-Weigh-suffix-truncation-when-choosing-a-split-poi.patch
- v7-0003-Add-split-at-new-tuple-page-split-optimization.patch
- v7-0001-Make-nbtree-indexes-have-unique-keys-in-tuples.patch
В списке pgsql-hackers по дате отправления: