Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
От | Peter Geoghegan |
---|---|
Тема | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
Дата | |
Msg-id | CAH2-WznS1rN=R-o4rdsDxUxpW4ciy5S9OGnJXa85sfDKKWA=5A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
|
Список | pgsql-hackers |
On Fri, Feb 25, 2022 at 5:52 PM Peter Geoghegan <pg@bowt.ie> wrote: > There is an important practical way in which it makes sense to treat > 0001 as separate to 0002. It is true that 0001 is independently quite > useful. In practical terms, I'd be quite happy to just get 0001 into > Postgres 15, without 0002. I think that that's what you meant here, in > concrete terms, and we can agree on that now. Attached is v10. While this does still include the freezing patch, it's not in scope for Postgres 15. As I've said, I still think that it makes sense to maintain the patch series with the freezing stuff, since it's structurally related. So, to be clear, the first two patches from the patch series are in scope for Postgres 15. But not the third. Highlights: * Changes to terminology and commit messages along the lines suggested by Andres. * Bug fixes to heap_tuple_needs_freeze()'s MultiXact handling. My testing strategy here still needs work. * Expanded refactoring by v10-0002 patch. The v10-0002 patch (which appeared for the first time in v9) was originally all about fixing a case where non-aggressive VACUUMs were at a gratuitous disadvantage (relative to aggressive VACUUMs) around advancing relfrozenxid -- very much like the lazy_scan_noprune work from commit 44fa8488. And that is still its main purpose. But the refactoring now seems related to Andres' idea of making non-aggressive VACUUMs decides to scan a few extra all-visible pages in order to be able to advance relfrozenxid. The code that sets up skipping the visibility map is made a lot clearer by v10-0002. That patch moves a significant amount of code from lazy_scan_heap() into a new helper routine (so it continues the trend started by the Postgres 14 work that added lazy_scan_prune()). Now skipping a range of visibility map pages is fundamentally based on setting up the range up front, and then using the same saved details about the range thereafter -- we don't have anymore ad-hoc VM_ALL_VISIBLE()/VM_ALL_FROZEN() calls for pages from a range that we already decided to skip (so no calls to those routines from lazy_scan_heap(), at least not until after we finish processing in lazy_scan_prune()). This is more or less what we were doing all along for one special case: aggressive VACUUMs. We had to make sure to either increment frozenskipped_pages or increment scanned_pages for every page from rel_pages -- this issue is described by lazy_scan_heap() comments on HEAD that begin with "Tricky, tricky." (these date back to the freeze map work from 2016). Anyway, there is no reason to not go further with that: we should make whole ranges the basic unit that we deal with when skipping. It's a lot simpler to think in terms of entire ranges (not individual pages) that are determined to be all-visible or all-frozen up-front, without needing to recheck anything (regardless of whether it's an aggressive VACUUM). We don't need to track frozenskipped_pages this way. And it's much more obvious that it's safe for more complicated cases, in particular for aggressive VACUUMs. This kind of approach seems necessary to make non-aggressive VACUUMs do a little more work opportunistically, when they realize that they can advance relfrozenxid relatively easily that way (which I believe Andres favors as part of overhauling freezing). That becomes a lot more natural when you have a clear and unambiguous separation between deciding what range of blocks to skip, and then actually skipping. I can imagine the new helper function added by v10-0002 (which I've called lazy_scan_skip_range()) eventually being taught to do these kinds of tricks. In general I think that all of the details of what to skip need to be decided up front. The loop in lazy_scan_heap() should execute skipping based on the instructions it receives from the new helper function, in the simplest way possible. The helper function can become more intelligent about the costs and benefits of skipping in the future, without that impacting lazy_scan_heap(). -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: