Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM
От | Melanie Plageman |
---|---|
Тема | Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM |
Дата | |
Msg-id | CAAKRu_a7oMNTWh3T-=vM=VNuV6pV_fMHnb5M69OQ6oPhx2201A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM
|
Список | pgsql-hackers |
On Tue, Nov 14, 2023 at 7:15 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-11-14 07:46:10 -0500, Melanie Plageman wrote: > > > FreeSpaceMapVacuumRange()'s comment says: > > > * As above, but assume that only heap pages between start and end-1 inclusive > > > * have new free-space information, so update only the upper-level slots > > > * covering that block range. end == InvalidBlockNumber is equivalent to > > > * "all the rest of the relation". > > > > > > So FreeSpaceMapVacuumRange(..., blkno) will not actually process the "effects" > > > of the RecordPageWithFreeSpace() above it - which seems confusing. > > > > Ah, so shall I pass blkno + 1 as end? > > I think there's no actual overflow danger, because MaxBlockNumber + 1 is > InvalidBlockNumber, which scans the rest of the relation (i.e. exactly the > intended block). Perhaps worth noting? Attached > > > =# DELETE FROM copytest_0; > > > =# VACUUM (VERBOSE) copytest_0; > > > ... > > > INFO: 00000: table "copytest_0": truncated 146264 to 110934 pages > > > ... > > > tuples missed: 5848 dead from 89 pages not removed due to cleanup lock contention > > > ... > > > > > > A bit of debugging later I figured out that this is due to the background > > > writer. If I SIGSTOP bgwriter, the whole relation is truncated... > > > > That's a bit sad. But isn't that what you would expect bgwriter to do? > > Write out dirty buffers? It doesn't know that those pages consist of > > only dead tuples and that you will soon truncate them. > > I think the issue more that it's feels wrong that a pin by bgwriter blocks > vacuum cleaning up. I think the same happens with on-access pruning - where > it's more likely for bgwriter to focus on those pages. Checkpointer likely > causes the same. Even normal backends can cause this while writing out the > page. > > It's not like bgwriter/checkpointer would actually have a problem if the page > were pruned - the page is locked while being written out and neither keep > pointers into the page after unlocking it again. > > > At this point I started to wonder if we should invent a separate type of pin > for a buffer undergoing IO. We basically have the necessary state already: > BM_IO_IN_PROGRESS. We'd need to look at that in some places, e.g. in > InvalidateBuffer(), instead of BUF_STATE_GET_REFCOUNT(), we'd need to also > look at BM_IO_IN_PROGRESS. > > > Except of course that that'd not change anything - > ConditionalLockBufferForCleanup() locks the buffer conditionally, *before* > even looking at the refcount and returns false if not. And writing out a > buffer takes a content lock. Which made me realize that > "tuples missed: 5848 dead from 89 pages not removed due to cleanup lock contention" > > is often kinda wrong - the contention is *not* cleanup lock specific. It's > often just plain contention on the lwlock. Do you think we should change the error message? > Perhaps we should just treat IO_IN_PROGRESS buffers differently in > lazy_scan_heap() and heap_page_prune_opt() and wait? Hmm. This is an interesting idea. lazy_scan_heap() waiting for checkpointer to write out a buffer could have an interesting property of shifting who writes out the FPI. I wonder if it would matter. - Melanie
Вложения
В списке pgsql-hackers по дате отправления: