Re: Emit fewer vacuum records by reaping removable tuples during pruning

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Emit fewer vacuum records by reaping removable tuples during pruning
Дата
Msg-id CA+TgmoYV5LFUgXS_g4S9P1AhQ63thPg6UJj8EsmsmEahFxLY_A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Emit fewer vacuum records by reaping removable tuples during pruning  (Melanie Plageman <melanieplageman@gmail.com>)
Ответы Re: Emit fewer vacuum records by reaping removable tuples during pruning  (Peter Geoghegan <pg@bowt.ie>)
Re: Emit fewer vacuum records by reaping removable tuples during pruning  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-hackers
On Wed, Jan 17, 2024 at 3:12 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> > Reviewing 0001, consider the case where a table has no indexes.
> > Pre-patch, PageTruncateLinePointerArray() will happen when
> > lazy_vacuum_heap_page() is called; post-patch, it will not occur.
> > That's a loss. Should we care? On the plus side, visibility map
> > repair, if required, can now take place. That's a gain.
>
> I thought that this wasn't an issue because heap_page_prune_execute()
> calls PageRepairFragmentation() which similarly modifies pd_lower and
> sets the hint bit about free line pointers.

Ah, OK, I didn't understand that PageRepairFragmentation() does what
is also done by PageTruncateLinePointerArray().

> Yes, I also spent some time thinking about this. In master, we do
> always call lazy_scan_new_or_empty() before calling
> lazy_scan_noprune(). The code is aiming to ensure we call
> lazy_scan_new_or_empty() once before calling either of
> lazy_scan_noprune() or lazy_scan_prune(). I think it didn't call
> lazy_scan_new_or_empty() unconditionally first because of the
> different lock types expected. But, your structure has solved that.
> I've used a version of your example code above in attached v9. It is
> much nicer.

Oh, OK, I see it now. I missed that lazy_scan_new_or_empty() was
called either way. Glad that my proposed restructuring managed to be
helpful despite that confusion, though. :-)

At a quick glance, I also like the way this looks. I'll review it more
thoroughly later. Does this patch require 0002 and 0003 or could it
equally well go first? I confess that I don't entirely understand why
we want 0002 and 0003.

> Ah, I realize I was not clear. I am now talking about inconsistencies
> in vacuuming the FSM itself. FreeSpaceMapVacuumRange(). Not updating
> the freespace map during the course of vacuuming the heap relation.

Fair enough, but I'm still not quite sure exactly what the question
is. It looks to me like the current code, when there are indexes,
vacuums the FSM after each round of index vacuuming. When there are no
indexes, doing it after each round of index vacuuming would mean never
doing it, so instead we vacuum the FSM every ~8GB. I assume what
happened here is that somebody decided doing it after each round of
index vacuuming was the "right thing," and then realized that was not
going to work if no index vacuuming was happening, and so inserted the
8GB threshold to cover that case. I don't really know what to make of
all of this. On a happy PostgreSQL system, doing anything after each
round of index vacuuming means doing it once, because multiple rounds
of indexing vacuum are extremely expensive and we hope that it won't
ever occur. From that point of view, the 8GB threshold is better,
because it means that when we vacuum a large relation, space should
become visible to the rest of the system incrementally without needing
to wait for the entire vacuum to finish. On the other hand, we also
have this idea that we want to record free space in the FSM once,
after the last time we touch the page. Given that behavior, vacuuming
the FSM every 8GB when we haven't yet done index vacuuming wouldn't
accomplish much of anything, because we haven't updated it for the
pages we just touched. On the third hand, the current behavior seems
slightly ridiculous, because pruning the page is where we're mostly
going to free up space, so we might be better off just updating the
FSM then instead of waiting. That free space could be mighty useful
during the long wait between pruning and marking line pointers unused.
On the fourth hand, that seems like a significant behavior change that
we might not want to undertake without a bunch of research that we
might not want to do right now -- and if we did do it, should we then
update the FSM a second time after marking line pointers unused?

I'm not sure if any of this is answering your actual question, though.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: initdb's -c option behaves wrong way?
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)