Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
От | Peter Geoghegan |
---|---|
Тема | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. |
Дата | |
Msg-id | CAH2-Wz=_XUu4j=vqGLmU=Re=caPy49yG4kwvyaeQiWKug4djKw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. |
Список | pgsql-hackers |
On Fri, Nov 8, 2019 at 10:35 AM Peter Geoghegan <pg@bowt.ie> wrote: > There is more bitrot, so I attach v22. The patch has stopped applying once again, so I attach v23. One reason for the bitrot is that I pushed preparatory commits, including today's "Make _bt_keep_natts_fast() use datum_image_eq()" commit. Good to get that out of the way. Other changes: * Decided to go back to turning deduplication on by default with non-unique indexes, and off by default using unique indexes. The unique index stuff was regressed enough with INSERT-heavy workloads that I was put off, despite my initial enthusiasm for enabling deduplication everywhere. * Disabled deduplication in system catalog indexes by deeming it generally unsafe. I realized that it would be impossible to provide a way to disable deduplication in system catalog indexes if it was enabled at all. The reason for this is simple: in general, it's not possible to set storage parameters for system catalog indexes. While I think that deduplication should work with system catalog indexes on general principle, this is about an existing limitation. Deduplication in catalog indexes can be revisited if and when somebody figures out a way to make storage parameters work with system catalog indexes. * Basic user documentation -- this still needs work, but the basic shape is now in place. I think that we should outline how the feature works by describing the internals, including details of the data structures. This provides guidance to users on when they should disable or enable the feature. This is discussed in the existing chapter on B-Tree internals. This felt natural because it's similar to how GIN explains its compression related features -- the discussion of the storage parameters in the CREATE INDEX page of the docs links to a description of GIN internals from "66.4. Implementation [of GIN]". * nbtdedup.c "single value" strategy stuff now considers the contribution of the page high key when considering how to deduplicate such that nbtsplitloc.c's "single value" strategy has a usable split point that helps it to hit its target free space. Not a very important detail. It's nice to be consistent with the corresponding code within nbtsplitloc.c. * Worked through all remaining XXX/TODO/FIXME comments, except one: The one that talks about the need for opclass infrastructure to deal with cases like btree/numeric_ops, or text with a nondeterministic collation. The user docs now reference the BITWISE opclass stuff that we're discussing over on the other thread. That's the only really notable open item now IMV. -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: