Re: "Write amplification" is made worse by "getting tired" whileinserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)
От | Peter Geoghegan |
---|---|
Тема | Re: "Write amplification" is made worse by "getting tired" whileinserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters) |
Дата | |
Msg-id | CAH2-WzmRyxLRuXF_HqRNOuqeVSsphRZfba-uv23eYg6f-Ycn=Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: "Write amplification" is made worse by "getting tired" whileinserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters) (Simon Riggs <simon@2ndquadrant.com>) |
Список | pgsql-hackers |
On Wed, Aug 29, 2018 at 11:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > If you include heap TID as a column the suffix will be unique and cannot > benefit from suffix truncation. Right. During a page split, we must generate a new high key that's less than or equal to all items on the left side (where the new high key goes), and strictly greater than all items on the right side (where the new high key values will be used as a downlink to). We'll only include heap TID explicitly in the new high key when there is no other way to get a value that is strictly less than all tuples on the right side, and only at the leaf level (we don't truncate during internal page splits, and the rules automatically work at non-leaf levels because values come from the leaf level). The original L&Y invariant that I propose to restore requires that downlinks be strictly less than all items on the child page the point to. Truncated attributes logically retain the value minus infinity, which is often what makes the true L&Y invariant hold. > It would be better to avoid including the heap TID as a column, since it is > already there. Or the other way around, don't include it as payload if it is > has to be a column. The high level principle to bear in mind is that pivot tuples (high keys at the leaf level, and all internal page tuples) only need to correctly guide searches. There is no obligation for pivot tuples to be able to retrieve the original values, for something like an index-only scan. We could even make their representation into a flat binary string someday -- this is called "key normalization" [1]. Likewise, pivot tuples are good candidates for automatic prefix compression because there is scarcely any need to decompress them (not working on that for v12, though). Nothing changes about the direct representation of 99%+ of tuples (those at the leaf level that point to the heap -- non-pivot tuples). We need an alternative representation for heap TID in pivot tuples for historical reasons (actually, I think what we have here would even work best in a green field situation). Not including a special heap TID representation in pivots, but including everything else is what I've called "logical truncation" on this thread. If you compare a serial primary key case with my patch and without, you typically see no difference in the size of the index. It is possible to end up with a larger index overall, but that should be extremely rare, and only affect very small indexes. > Alternatively, include only the heap block number. That would make it > non-unique, but never more than 240 duplicates. So it would allow suffix > truncation, and yet still avoid the multi-page split effect. I'm not sure what you mean by the multi-page split effect -- all internal pages are split recursively, during a leaf page split. In general, free space management needs to remain as simple as possible, and splitting the same page twice in one go is a big no-no. I conservatively enforce that the 1/3 of a page restriction has a bit of extra slop for an extra heap TID. There is one choke point that everything takes place -- during leaf page splits. Nothing else changes, apart from index scan logic, which may have to search with an extra heap TID column, plus one or two other small things along those lines. [1] https://wiki.postgresql.org/wiki/Key_normalization -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: