Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples
От | Noah Misch |
---|---|
Тема | Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples |
Дата | |
Msg-id | 20130110014908.GA11600@tornado.leadboat.com обсуждение исходный текст |
Ответ на | Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Ответы |
Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples
|
Список | pgsql-hackers |
Hi Pavan, Thanks for reviewing. On Tue, Jan 08, 2013 at 02:41:54PM +0530, Pavan Deolasee wrote: > On Tue, Jan 8, 2013 at 8:19 AM, Noah Misch <noah@leadboat.com> wrote: > > At that point in the investigation, I realized that the cost of being able > > to > > remove entire tuples in lazy_vacuum_heap() greatly exceeds the benefit. > > Again, the benefit is being able to remove tuples whose inserting > > transaction > > aborted between the HeapTupleSatisfiesVacuum() call in heap_page_prune() > > and > > the one in lazy_scan_heap(). To make that possible, lazy_vacuum_heap() > > grabs > > a cleanup lock, calls PageRepairFragmentation(), and emits a WAL record for > > every page containing LP_DEAD line pointers or HEAPTUPLE_DEAD tuples. If > > we > > take it out of the business of removing tuples, lazy_vacuum_heap() can skip > > WAL and update lp_flags under a mere shared lock. The second attached > > patch, > > for HEAD, implements that. Besides optimizing things somewhat, it > > simplifies > > the code and removes rarely-tested branches. (This patch supersedes the > > backpatch-oriented patch rather than stacking with it.) > > > > > +1. I'd also advocated removing the line pointer array scan in > lazy_scan_heap() because the heap_page_prune() does most of the work. The > reason why we still have that scan in lazy_scan_heap() is to check for new > dead tuples (which should be rare), check for all-visibility of the page > and freeze tuples if possible. I think we can move all of that to > heap_page_prune(). Yes; that was an interesting thread. > But while you are at it, I wonder if you have time and energy to look at > the single pass vacuum work that I, Robert and others tried earlier but > failed to get to the final stage [1][2]. If we do something like that, we > might not need the second scan of the heap at all, which as you rightly > pointed out, does not do much today anyway, other than marking LP_DEAD line > pointers to LP_UNUSED. We can push that work to the next vacuum or even a > foreground task. I don't wish to morph this patch into a removal of lazy_vacuum_heap(), but that will remain promising for the future. > BTW, with respect to your idea of not WAL logging the operation of setting > LP_DEAD to LP_UNUSED in lazy_vacuum_page(), I wonder if this would create > problems with crash recovery. During normal vacuum operation, you may have > converted LP_DEAD to LP_UNUSED. Then you actually used one of those line > pointers for subsequent PageAddItem() on the page. If you crash after that > but before the page gets written to the disk, during crash recovery, AFAICS > PageAddItem() will complain with "will not overwrite a used ItemId" warning > and subsequent PANIC of the recovery. Good catch; I can reproduce that problem. I've now taught PageAddItem() to tolerate replacing an LP_DEAD item until recovery reaches consistency. Also, since lazy_vacuum_page() no longer calls PageRepairFragmentation(), it should call PageSetHasFreeLinePointers() itself. The attached update fixes both problems. (I have also attached the unchanged backpatch-oriented fix to keep things together.) nm
Вложения
В списке pgsql-hackers по дате отправления: