Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
От | John Naylor |
---|---|
Тема | Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin |
Дата | |
Msg-id | CANWCAZaRzYu4uNa-Xo8mWe1xp9zC9zjiY_UVCqMJMfTPmLQDOQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Vacuum ERRORs out considering freezing dead tuples from before OldestXmin (Melanie Plageman <melanieplageman@gmail.com>) |
Список | pgsql-hackers |
On Fri, Jun 20, 2025 at 9:45 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > So, I think I figured out why I was seeing the test hang waiting for pg_stat_progress_vacuum to report an index count. > > Using auto_explain, I determined that the cursor was using an index-only scan with lower row counts. That meant it pinnedan index leaf page instead of a heap page and the first round of index vacuuming couldn't complete because btree indexvacuuming requires we acquire a cleanup lock on every leaf page. > > I solved this by disabling all index scans in the cursor's session. Interesting find! I wondered how it would look if the cursor referred to a extra non-indexed column, but the above seems fine and the whole thing probably easier to reason about with only a single column. > I attached the updated patch which passes for me on 32 and 64-bit builds. We've managed to reduce the row count so low(1000-2000 rows) that I'm not sure it matters if we have a 64-bit and 32-bit case. However, since we have the large blockcomment about the required number of rows, I figured we might as well have the two different nrows. It seems backwards to have the comment influence the code -- the comment should document decisions around the code. 2000 is already 100x smaller than pg16, so it's not really buying us much to be platform-aware. The first two sentences in the comment seem fine. After that, I'd just say: # We choose the number of rows to make sure we exceed maintenance_work_mem on all platforms we support, but we also want it to be small so that the test runtime is short. Last I checked, our regression tests fail on block sizes other than 8kB. If we ever need to cater to that, this comment covers that eventuality as well. > I'll have to do some more research on 14-16 to see if this could be a problem there. > > I also disabled prefetching, concurrent IO, and read combining for vacuum -- it didn't cause a problem in my local tests,but I could see it interfering with the test and potentially causing flakes/failures on some machines/configurations.That means I'll have to do a slightly different patch for 17 than 18 (17 doesn't have io_combine_limit). > > Finally, I disabled parallelism as a future-proofing for having heap vacuum parallelism -- wouldn't want a mysterious failurein this test in the future. + (PARALLEL 0 is a future-proofing measure in case we adopt + # parallel heap vacuuming) Maybe it's possible to phrase this so it's true regardless of whether we adopt that or not? "PARALLEL 0 shouldn't be necessary, but guards against the possibility of parallel heap vacuuming" -- John Naylor Amazon Web Services
В списке pgsql-hackers по дате отправления: