Re: Deleting older versions in unique indexes to avoid page splits
От | Peter Geoghegan |
---|---|
Тема | Re: Deleting older versions in unique indexes to avoid page splits |
Дата | |
Msg-id | CAH2-WznBo-PTuFBNirwMZnt7nUPGyOe4yxd7QYXQn1AV6Bbp9w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Deleting older versions in unique indexes to avoid page splits (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Deleting older versions in unique indexes to avoid page splits
Re: Deleting older versions in unique indexes to avoid page splits Re: Deleting older versions in unique indexes to avoid page splits |
Список | pgsql-hackers |
On Tue, Dec 1, 2020 at 2:18 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Dec 1, 2020 at 1:50 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > This is a wholly new concept with a lot of heuristics. It's a lot of > > swallow. Attached is v11, which cleans everything up around the tableam interface. There are only two patches in v11, since the tableam refactoring made it impossible to split the second patch into a heapam patch and nbtree patch (there is no reduction in functionality compared to v10). Most of the real changes in v11 (compared to v10) are in heapam.c. I've completely replaced the table_compute_xid_horizon_for_tuples() interface with a new interface that supports all existing requirements (from index deletions that support LP_DEAD deletion), while also supporting these new requirements (e.g. bottom-up index deletion). So now heap_compute_xid_horizon_for_tuples() becomes heap_compute_delete_for_tuples(), which has different arguments but the same overall structure. All of the new requirements can now be thought of as additive things that we happen to use for nbtree callers, that could easily also be used in other index AMs later on. This means that there is a lot less new code in heapam.c. Prefetching of heap blocks for the new bottom-up index deletion caller now works in the same way as it has worked in heap_compute_xid_horizon_for_tuples() since Postgres 12 (more or less). This is a significant improvement compared to my original approach. Chaning heap_compute_xid_horizon_for_tuples() rather than adding a sibling function started to make sense when v10 of the patch taught regular nbtree LP_DEAD deletion (the thing that has been around since PostgreSQL 8.2) to add "extra" TIDs to check in passing, just in case we find that they're also deletable -- why not just have one function? It also means that hash indexes and GiST indexes now use the heap_compute_delete_for_tuples() function, though they get the same behavior as before. I imagine that it would be pretty straightforward to bring that same benefit to those other index AMs, since their implementations are already derived from the nbtree implementation of LP_DEAD deletion (and because adding extra TIDs to check in passing in these other index AMs should be fairly easy). > > > +} TM_IndexDelete; > > > > +} TM_IndexStatus; > > > Is it really significantly faster to have two arrays? If you had just > > one array, each element would be only 12 bytes long. That's not much > > smaller than TM_IndexDeletes, which is 8 bytes. I kept this facet of the design in v11, following some deliberation. I found that the TID sort operation appeared quite prominently in profiles, and so the optimizations mostly seemed to still make sense. I also kept one shellsort specialization. However, I removed the second specialized sort implementation, so at least there is only one specialization now (which is small anyway). I found that the second sort specialization (added to heapam.c in v10) really wasn't pulling its weight, even in more extreme cases of the kind that justified the optimizations in the first place. -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: