Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum
От | Peter Geoghegan |
---|---|
Тема | Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum |
Дата | |
Msg-id | CAH2-WznOEQCO-nyTHRn9de2J16+n1XCvbBnnt+om2ZKUpTSvDg@mail.gmail.com обсуждение исходный текст |
Ответ на | Call lazy_check_wraparound_failsafe earlier for parallel vacuum ("Imseih (AWS), Sami" <simseih@amazon.com>) |
Ответы |
Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum
("Imseih (AWS), Sami" <simseih@amazon.com>)
|
Список | pgsql-hackers |
On Wed, Nov 9, 2022 at 6:29 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > When a user is running a parallel vacuum and the vacuum is long running > > due to many large indexes, it would make sense to check for failsafe earlier. It makes sense to prefer consistency here, I suppose. The reason why we're not consistent is because it was easier not to be, which isn't exactly the best reason (nor the worst). I don't think that it's obvious that we need to call the failsafe at any particular frequency. There is probably an argument to be made for the idea that we're not checking frequently enough (even in the common serial VACUUM case), just as there is an argument to be made for the opposite idea. It's not like there is some simple linear relationship (or any kind of relationship) between the amount of physical work performed by one VACUUM operation, and the rate of XID consumption by the system as a whole. And so the details of how we do it have plenty to do with what we can afford to do. My gut instinct is that the most important thing is to at least call lazy_check_wraparound_failsafe() once per index scan. Multiple index scans are disproportionately involved in VACUUMs that take far longer than expected, which are presumably the kind of VACUUMs that tend to be running when the failsafe actually triggers. We don't really expect the failsafe to trigger, so presumably when it actually does things haven't been going well for some time. (Index corruption that prevents forward progress on one particular index is another example.) That said, one thing that does bother me in this area occurs to me: we call lazy_check_wraparound_failsafe() from lazy_scan_heap() (before we get to the index scans that you're talking about) at an interval that is based on how many heap pages we've either processed (and recorded as a scanned_pages page) *or* have skipped over using the visibility map. In other words we use blkno here, when we should really be using scanned_pages instead: if (blkno - next_failsafe_block >= FAILSAFE_EVERY_PAGES) { lazy_check_wraparound_failsafe(vacrel); next_failsafe_block = blkno; } This code effectively treats pages skipped using the visibility map as equivalent to pages physically scanned (scanned_pages), even though skipping requires essentially no work at all. That just isn't logical, and feels like something worth fixing. The fundamental unit of work in lazy_scan_heap() is a *scanned* heap page. -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: