Re: Combine Prune and Freeze records emitted by vacuum
От | Heikki Linnakangas |
---|---|
Тема | Re: Combine Prune and Freeze records emitted by vacuum |
Дата | |
Msg-id | e7b3116c-79b0-4864-a8d2-5b59b498f724@iki.fi обсуждение исходный текст |
Ответ на | Re: Combine Prune and Freeze records emitted by vacuum (Melanie Plageman <melanieplageman@gmail.com>) |
Ответы |
Re: Combine Prune and Freeze records emitted by vacuum
Re: Combine Prune and Freeze records emitted by vacuum |
Список | pgsql-hackers |
On 30/03/2024 07:57, Melanie Plageman wrote: > On Fri, Mar 29, 2024 at 12:32:21PM -0400, Melanie Plageman wrote: >> On Fri, Mar 29, 2024 at 12:00 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> Here's another idea: In the first loop through the offsets, where we >>> gather the HTSV status of each item, also collect the offsets of all HOT >>> and non-HOT items to two separate arrays. Call heap_prune_chain() for >>> all the non-HOT items first, and then process any remaining HOT tuples >>> that haven't been marked yet. >> >> That's an interesting idea. I'll try it out and see how it works. > > Attached v10 implements this method of dividing tuples into HOT and > non-HOT and processing the potential HOT chains first then processing > tuples not marked by calling heap_prune_chain(). > > I have applied the refactoring of heap_prune_chain() to master and then > built the other patches on top of that. Committed some of the changes. Continuing to reviewing the rest. > I discovered while writing this that LP_DEAD item offsets must be in > order in the deadoffsets array (the one that is used to populate > LVRelState->dead_items). > > When I changed heap_page_prune_and_freeze() to partition the offsets > into HOT and non-HOT during the first loop through the item pointers > array (where we get tuple visibility information), we add dead item > offsets as they are encountered. So, they are no longer in order. I've > added a quicksort of the deadoffsets array to satisfy vacuum. Good catch. > I think that we are actually successfully removing more RECENTLY_DEAD > HOT tuples than in master with heap_page_prune()'s new approach, and I > think it is correct; but let me know if I am missing something. Yep. In the case of a two-item chain, RECENTLY_DEAD -> DEAD, the new code can always remove both items. On 'master', it depends on which item it happens to process first. If it processes the RECENTLY_DEAD item first, then it follows the chain and removes both. But if it processes the DEAD item first, the RECENTLY_DEAD item is left behind. It will be removed by the next VACUUM, so it's not a correctness issue, and probably doesn't make any practical performance difference either as it's a rare corner case, but I feel better that it's more deterministic now. > The early patches in the set include some additional comment cleanup as > well. 0001 is fairly polished. 0004 could use some variable renaming > (this patch partitions the tuples into HOT and not HOT and then > processes them). I was struggling with some of the names here > (chainmembers and chaincandidates is confusing). I didn't understand why you wanted to juggle both partitions in the same array. So I separated them into two arrays, and called them 'root_items' and 'heaponly_items'. In some micro-benchmarks, the order that the items were processed made a measurable difference. So I'm processing the items in the reverse order. That roughly matches the order the items are processed in master, as it iterates the offsets from high-to-low in the first loop, and low-to-high in the second loop. > The bulk of the combining of pruning and freezing is lumped into 0010. > > I had planned to separate 0010 into 4 separate patches: 1 to execute > freezing in heap_prune_chain(), 1 for the freeze heuristic approximating > what is on master, and 1 for emitting a single record containing both > the pruning and freezing page modifications. > > I ended up not doing this because I felt like the grouping of changes in > 0007-0009 is off. As long as I still execute freezing in > lazy_scan_prune(), I have to share lots of state between > lazy_scan_prune() and heap_page_prune(). This meant I added a lot of > parameters to heap_page_prune() that later commits removed -- making the > later patches noisy and not so easy to understand. > > I'm actually not sure what should go in what commit (either for review > clarity or for the actual final version). > > But, I think we should probably focus on review of the code and not as > much how it is split up yet. Yeah, that's fine, 0010 is manageable-sized now. > The final state of the code could definitely use more cleanup. I've been > staring at it for awhile, so I could use some thoughts/ideas about what > part to focus on improving. Committed some of the changes. I plan to commit at least the first of these remaining patches later today. I'm happy with it now, but I'll give it a final glance over after dinner. I'll continue to review the rest after that, but attached is what I have now. -- Heikki Linnakangas Neon (https://neon.tech)
Вложения
- v11-0001-Handle-non-chain-tuples-outside-of-heap_prune_ch.patch
- v11-0002-Invoke-heap_prune_record_prunable-during-record-.patch
- v11-0003-Introduce-PRUNE_DO_-actions.patch
- v11-0004-Prepare-freeze-tuples-in-heap_page_prune.patch
- v11-0005-Set-hastup-in-heap_page_prune.patch
- v11-0006-Save-dead-tuple-offsets-during-heap_page_prune.patch
- v11-0007-Combine-freezing-and-pruning.patch
В списке pgsql-hackers по дате отправления: