Re: Disabling Heap-Only Tuples

Поиск
Список
Период
Сортировка
От Thom Brown
Тема Re: Disabling Heap-Only Tuples
Дата
Msg-id CAA-aLv4omW4L+R6C7m+93b8JHakSu5Hgs6w4-DfrLT4Xot15fg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Disabling Heap-Only Tuples  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы Re: Disabling Heap-Only Tuples  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-hackers
On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Wed, 5 Jul 2023 at 13:03, Thom Brown <thom@linux.com> wrote:
> >
> > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
> > <boekewurm+postgres@gmail.com> wrote:
> > >
> > > On Wed, 5 Jul 2023 at 12:45, Thom Brown <thom@linux.com> wrote:
> > > > Heap-Only Tuple (HOT) updates are a significant performance
> > > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > > comes with a caveat: it means that if we have lots of available space
> > > > earlier on in the relation, it can only be used for new tuples or in
> > > > cases where there's insufficient space on a page for an UPDATE to use
> > > > HOT.
> > > >
> > > > This mechanism limits our options for condensing tables, forcing us to
> > > > resort to methods like running VACUUM FULL/CLUSTER or using external
> > > > tools like pg_repack. These either require exclusive locks (which will
> > > > be a deal-breaker on large tables on a production system), or there's
> > > > risks involved. Of course we can always flood pages with new versions
> > > > of a row until it's forced onto an early page, but that shouldn't be
> > > > necessary.
> > > >
> > > > Considering these trade-offs, I'd like to propose an option to allow
> > > > superusers to disable HOT on tables. The intent is to trade some
> > > > performance benefits for the ability to reduce the size of a table
> > > > without the typical locking associated with it.
> > >
> > > Interesting use case, but I think that disabling HOT would be missing
> > > the forest for the trees. I think that a feature that disables
> > > block-local updates for pages > some offset would be a better solution
> > > to your issue: Normal updates also prefer the new tuple to be stored
> > > in the same pages as the old tuple if at all possible, so disabling
> > > HOT wouldn't solve the issue of tuples residing in the tail of your
> > > table - at least not while there is still empty space in those pages.
> >
> > Hmm... I see your point.  It's when an UPDATE isn't going to land on
> > the same page that it relocates to the earlier available page.  So I
> > guess I'm after whatever mechanism would allow that to happen reliably
> > and predictably.
> >
> > So $subject should really be "Allow forcing UPDATEs off the same page".
>
> You'd probably want to do that only for a certain range of the table -
> for a table with 1GB of data and 3GB of bloat there is no good reason
> to force page-crossing updates in the first 1GB of the table - all
> tuples of the table will eventually reside there, so why would you
> take a performance penalty and move the tuples from inside that range
> to inside that same range?

I'm thinking more of a case of:

<magic to stop UPDATES from landing on same page>

UPDATE bigtable
SET primary key = primary key
WHERE ctid IN (
    SELECT ctid
    FROM bigtable
    ORDER BY ctid DESC
    LIMIT 100000);

> Something else to note: Indexes would suffer some (large?) amount of
> bloat in this process, as you would be updating a lot of tuples
> without the HOT optimization, thus increasing the work to be done by
> VACUUM.
> This may result in more bloat in indexes than what you get back from
> shrinking the table.

This could be the case, but I guess indexes are expendable to an
extent, unlike tables.

Thom



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Следующее
От: Thom Brown
Дата:
Сообщение: Re: Allow specifying a dbname in pg_basebackup connection string