Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
От | Peter Geoghegan |
---|---|
Тема | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons |
Дата | |
Msg-id | CAH2-Wz=yTWnVu+HeHGKb2AGiADL9eprn-cKYAto4MkKOuiGtRQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
|
Список | pgsql-hackers |
On Tue, Feb 5, 2019 at 4:50 PM Peter Geoghegan <pg@bowt.ie> wrote: > Heikki and I had the opportunity to talk about this recently. We found > an easy way forward. I believe that the nbtsplitloc.c algorithm itself > is fine -- the code will need to be refactored, though. Attached v12 does not include this change, though I have every intention of doing the refactoring described for v13. The nbtsplitloc.c/split algorithm refactoring would necessitate revalidating the patch's performance, though, which didn't seem worth blocking on. Besides, there was bit rot that needed to be fixed. Notable improvements in v12: * No more papering-over regression test differences caused by pg_depend issues, thanks to recent work by Tom (today's commit 1d92a0c9). * I simplified the code added to _bt_binsrch() to deal with saving and restoring binary search bounds for _bt_check_unique()-caller insertions (this is from first/"Refactor nbtree insertion scankeys" patch). I also improved matters within _bt_check_unique() itself: the early "break" there (based on reaching the known strict upper bound from cache binary search) works in terms of the existing _bt_check_unique() loop invariant. This even allowed me to add a new assertion that makes sure that breaking out of the loop early is correct -- we call _bt_isequal() for next item on assert-enabled builds when we break having reached strict upper bound established by initial binary search. In other words, _bt_check_unique() ends up doing the same number of _bt_isequal() calls as it did on the master branch, provided assertions are enabled. * I've restored regression test coverage that the patch previously inadvertently took away. Suffix truncation made deliberately-tall B-Tree indexes from the regression tests much shorter, making the tests fail to test the code paths the tests originally targeted. I needed to find ways to "defeat" suffix truncation so I still ended up with a fairly tall tree that hit various code paths. I think that we went from having 8 levels in btree_tall_idx (i.e. ridiculously many) to having only a single root page when I first caught the problem! Now btree_tall_idx only has 3 levels, which is all we really need. Even multi-level page deletion didn't have any coverage in previous versions. I used gcov to specifically verify that we have good multi-level page deletion coverage. I also used gcov to make sure that we have coverage of the v11 "cache rightmost block" optimization, since I noticed that that was missing (though present on the master branch) -- that's actually all that the btree_tall_idx tests in the patch, since multi-level page deletion is covered by a covering-indexes-era test. Finally, I made sure that we have coverage of fast root splits. In general, I preserved the original intent behind the existing tests, all of which I was fairly familiar with from previous projects. * I've added a new "relocate" bt_index_parent_check()/amcheck option, broken out in a separate commit. This new option makes verification relocate each and every leaf page tuple, starting from the root each time. This means that there will be at least one piece of code that specifically relies on "every tuple should have a unique key" from the start, which seems like a good idea. This enhancement to amcheck allows me to detect various forms of corruption that no other existing verification option would catch. In particular, I can catch various very subtle "cross-cousin inconsistencies" that require that we verify a page using its grandparent rather than its parent [1] (existing checks catch some but not all "cousin problem" corruption). Simply put, this amcheck enhancement allows me to detect corruption of the least significant byte in a key value in the root page -- that kind of corruption will cause index scans to miss only a small number of tuples at the leaf level. Maybe this scenario isn't realistic, but I'd rather not take any chances. * I rethought the "single value mode" fillfactor, which I've been suspicious of for a while now. It's now 96, down from 99. Micro-benchmarks involving concurrent sessions inserting into a low cardinality index led me to the conclusion that 99 was aggressively high. It was not that hard to get excessive page splits with these microbenchmarks, since insertions with monotonically increasing heap TIDs arrived a bit out of order with a lot of concurrency. 99 worked a bit better than 96 with only one session, but significantly worse with concurrent sessions. I still think that it's a good idea to be more aggressive than default leaf fillfactor, but reducing "single value mode" fillfactor to 90 (or whatever the user set general leaf fillfactor to) wouldn't be so bad. [1] http://subs.emis.de/LNI/Proceedings/Proceedings144/32.pdf -- Peter Geoghegan
Вложения
- v12-0007-DEBUG-Add-pageinspect-instrumentation.patch
- v12-0006-Allow-tuples-to-be-relocated-from-root-by-amchec.patch
- v12-0004-Add-split-after-new-tuple-optimization.patch
- v12-0005-Add-high-key-continuescan-optimization.patch
- v12-0003-Pick-nbtree-split-points-discerningly.patch
- v12-0002-Treat-heap-TID-as-part-of-the-nbtree-key-space.patch
- v12-0001-Refactor-nbtree-insertion-scankeys.patch
В списке pgsql-hackers по дате отправления: