Re: Eager page freeze criteria clarification

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Eager page freeze criteria clarification
Дата
Msg-id CAH2-WzmwKS-h4TzXmGz2-8FKu8YSyRMPvqA5pdNZoL4MRNoKEg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Eager page freeze criteria clarification  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Eager page freeze criteria clarification  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Sep 29, 2023 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I think that's true. For me, the issue is what a user is practically
> likely to notice and care about. I submit that on a
> not-particularly-busy system, it would probably be fine to freeze
> aggressively in almost every situation, because you're only incurring
> costs you can afford to pay. On a busy system, it's more important to
> be right, or at least not too badly wrong. But even on a busy system,
> I think that when the time between data being written and being frozen
> is more than a few tens of minutes, it's very doubtful that anyone is
> going to notice the contribution that freezing makes to the overall
> workload. They're much more likely to notice an annoying autovacuum
> than they are to notIce a bit of excess freezing that ends up getting
> reversed. But when you start cranking the time between writing data
> and freezing it down into the single-digit numbers of minutes, and
> even more if you push down to tens of seconds or less, now I think
> people are going to care more about useless freezing work than about
> long-term autovacuum risks. Because now their database is really busy
> so they care a lot about performance, and seemingly most of the data
> involved is ephemeral anyway.

I think that that's all true.

As I believe I pointed out at least once in the past, my thinking
about the practical requirements from users was influenced by this
paper/talk:

https://www.microsoft.com/en-us/research/video/cost-performance-in-modern-data-stores-how-data-cashing-systems-succeed/

For the types of applications that Postgres is a natural fit for, most
of the data is cold -- very cold ("freezing" cold, even). If you have
a 50GB pgbench_accounts table, Postgres will always perform
suboptimally. But so will SQL Server, Oracle, and MySQL.

While pgbench makes a fine stress-test, for the most part its workload
is highly unrealistic. And yet we seem to think that it's just about
the most important benchmark of all. If we're not willing to get over
even small regressions in pgbench, I fear we'll never make significant
progress in this area. It's at least partly a cultural problem IMV.

The optimal freezing strategy for pgbench_accounts is to never freeze,
even once. It doesn't matter if you vary the distributions of the
accounts table updates -- it's still inevitable that each and every
row will get updated before too long. In fact, it's not just freezing
that should always be avoided here -- same applies with pruning by
VACUUM.

As I suggested in my last email, the lesson offered by
pgbench_accounts table seems to be "never VACUUM at all, except
perhaps to advance relfrozenxid" (which shouldn't actually require any
freezing even one page). If you haven't tuned heap fill factor, then
you might want to VACUUM a bit, at first. But, overall, vacuuming is
bad. That is the logical though absurd conclusion. It completely flies
in the face of practical experience.

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: dikkop seems unhappy because of openssl stuff (FreeBSD 14-BETA1)
Следующее
От: James Coleman
Дата:
Сообщение: Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers