Re: posix_fadvise missing in the walsender

Поиск
Список
Период
Сортировка
От Joachim Wieland
Тема Re: posix_fadvise missing in the walsender
Дата
Msg-id CACw0+13C6mbCtGmYAyx1_+RsjCiKJhJ7WpG3JhfosF2CXxGndA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: posix_fadvise missing in the walsender  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: posix_fadvise missing in the walsender  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Feb 20, 2013 at 4:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Feb 19, 2013 at 5:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I agree with Merlin and Joachim - if we have the call in one place, we
>> should have it in both.
>
> We might want to assess whether we even want to have it one place.
> I've seen cases where the existing call hurts performance, because of
> WAL file recycling.

That's interesting, I hadn't thought about WAL recycling.

I now agree that this whole thing is even more complicated, you might
have an archive_command set as well, like "cp" for instance, that
reads in the WAL file again, possibly even right after we called
posix_fadvise on it.

It appears to me that the right strategy depends on a few factors:

a) what ratio of your active dataset fits into RAM?
b) how many WAL files do you have?
c) how long does it take for them to get recycled?
d) archive_command set / wal_senders active?

And recommendations for the two extremes would be:

If your dataset fits mostly into RAM and if you have only few WAL
files that get recycled quickly then you don't want to evict the WAL
file from the buffer cache.
On the other hand if your dataset doesn't fit into RAM and you have
many WAL files that take a while until they get recycled, then you
should consider hinting to the OS.

If you're in that second category (I am) and you're also using the
archive_command you could just piggyback the posix_fadvise call onto
your archive_command, assuming that the walsender is already done with
the file at that moment. And I'm also pretty certain that Robert's
setup that he used for the write scalability tests fell into the first
category.

So given the above, I think it's possible to come up with benchmarks
that prove whatever you want to prove :-)


Joachim



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Stark
Дата:
Сообщение: Re: [RFC] indirect toast tuple support
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Support for REINDEX CONCURRENTLY