Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
От | Peter Geoghegan |
---|---|
Тема | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. |
Дата | |
Msg-id | CAH2-Wzmn97x3JRbmF=2uQrc5ruusuGrpB_eOUSuJfhYOdikS7Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
|
Список | pgsql-hackers |
On Wed, Sep 18, 2019 at 10:43 AM Peter Geoghegan <pg@bowt.ie> wrote: > I'm currently working on merging my refactored version of > _bt_dedup_one_page() with your v15 WAL-logging. This is a bit tricky. > (I have finished merging the other WAL-logging stuff, though -- that > was easy.) I attach version 16. This revision merges your recent work on WAL logging with my recent work on simplifying _bt_dedup_one_page(). See my e-mail from earlier today for details. Hopefully this will be a bit easier to work with when you go to make _bt_dedup_one_page() do raw PageIndexMultiDelete() + PageAddItem() calls against the page contained in a buffer directly (rather than using a temp version of the page in local memory in the style of _bt_split()). I find the loop within _bt_dedup_one_page() much easier to follow now. While I'm looking forward to seeing the PageIndexMultiDelete()/PageAddItem() approach that you come up with, the basic design of _bt_dedup_one_page() seems to be in much better shape today than it was a few weeks ago. I am going to spend the next few days teaching _bt_dedup_one_page() about space utilization. I'll probably make it respect a fillfactor-style target. I've noticed that it is often too aggressive about filling a page, though less often it actually shows the opposite problem: it fails to use more than about 2/3 of the page for the same value, again and again (must be something to do with the exact width of the tuples). In general, _bt_dedup_one_page() should know a few things about what nbtsplitloc.c will do when the page is very likely to be split soon. I'll also spend some more time working on the opclass infrastructure that we need to disable deduplication with datatypes where it is unsafe [1]. Other changes: * qsort() is no longer used by BTreeFormPostingTuple() in v16 -- we can easily sorting the array of heap TIDs the caller's responsibility. Since the heap TID column is sorted in ascending order among duplicates on a page, and since TIDs within individual posting lists are also sorted in ascending order, there is no need to resort. I added a new assertion to BTreeFormPostingTuple() that verifies that its caller actually gets it right. * The new nbtpage.c/VACUUM code has been tweaked to minimize the changes required against master. Nothing significant, though. It was easier to refactor the _bt_dedup_one_page() stuff by temporarily making nbtsort.c not use it. I didn't want to delay getting v16 to you, so I didn't take the time to fix-up nbtsort.c to use the new stuff. It's actually using its own old copy of stuff that it should get from nbtinsert.c in v16 -- it calls _bt_dedup_item_tid_sort(), not the new _bt_dedup_save_htid() function. I'll update it soon, though. [1] https://www.postgresql.org/message-id/flat/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: