Re: Bypassing shared_buffers

Поиск
Список
Период
Сортировка
От Vladimir Churyukin
Тема Re: Bypassing shared_buffers
Дата
Msg-id CAFSGpE2Lt+sdh4mXM48RwJdP0mzgQvpiRGRb6N3sin77KYuu8w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Bypassing shared_buffers  (Konstantin Knizhnik <knizhnik@garret.ru>)
Ответы Re: Bypassing shared_buffers  (Greg Sabino Mullane <htamfids@gmail.com>)
Список pgsql-hackers

On Thu, Jun 15, 2023 at 12:32 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:


On 15.06.2023 4:37 AM, Vladimir Churyukin wrote:
> Ok, got it, thanks.
> Is there any alternative approach to measuring the performance as if
> the cache was empty?
> The goal is basically to calculate the max possible I/O time for a
> query, to get a range between min and max timing.
> It's ok if it's done during EXPLAIN ANALYZE call only, not for regular
> executions.
> One thing I can think of is even if the data in storage might be
> stale, issue read calls from it anyway, for measuring purposes.
> For EXPLAIN ANALYZE it should be fine as it doesn't return real data
> anyway.
> Is it possible that some pages do not exist in storage at all? Is
> there a different way to simulate something like that?
>

I do not completely understand what you want to measure: how fast cache
be prewarmed or what is the performance
when working set doesn't fit in memory?


No, it's not about working set or prewarming speed.
We're trying to see what is the worst performance in terms of I/O, i.e. when the database just started up or the data/indexes being queried are not cached at all.

Why not changing `shared_buffers` size to some very small values (i.e.
1MB) doesn't work?
As it was already noticed, there are levels of caching: shared buffers
and OS file cache.
By reducing size of shared buffers you rely mostly on OS file cache.
And actually there is no big gap in performance here - at most workloads
I didn't see more than 15% difference).

I thought about the option of setting minimal shared_buffers, but it requires a server restart anyway, something I'd like to avoid.

You can certainly flush OS cache `echo 3 > /proc/sys/vm/drop_caches` and
so simulate cold start.
But OS cached will be prewarmed quite fast (unlike shared buffer because
of strange Postgres ring-buffer strategies which cause eviction of pages
from shared buffers even if there is a lot of free space).

So please more precisely specify the goal of your experiment.
"max possible I/O time for a query" depends on so many factors...
Do you consider just one client working in isolation or there will be
many concurrent queries and background tasks like autovacuum and
checkpointer  competing for the resources?

My point is that if you need some deterministic result then you will
have to exclude a lot of different factors which may affect performance
and then ... you calculate speed of horse in vacuum, which has almost no
relation to real performance.


Exactly, we need more or less deterministic results for how bad I/O timings can be. 
Even though it's not necessarily the numbers we will be getting in real life, it gives us ideas about distribution, 
and it's useful because we care about the long tail (p99+) of our queries.  
For simplicity let's say it will be a single client only (it will be hard to do the proposed solutions reliably with other stuff running in parallel anyway).  

-Vladimir Churyukin

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Sho Kato (Fujitsu)"
Дата:
Сообщение: Fix a typo in rewriteHandler.c
Следующее
От: Andrey Lepikhov
Дата:
Сообщение: MergeJoin beats HashJoin in the case of multiple hash clauses