Re: PGC_SIGHUP shared_buffers?

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: PGC_SIGHUP shared_buffers?
Дата
Msg-id 99a4f21e-e117-4169-8626-67a7678654f0@garret.ru
обсуждение исходный текст
Ответ на Re: PGC_SIGHUP shared_buffers?  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers


On 16/02/2024 10:37 pm, Thomas Munro wrote:
On Fri, Feb 16, 2024 at 5:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
3. Reserve lots of address space and then only use some of it. I hear
rumors that some forks of PG have implemented something like this. The
idea is that you convince the OS to give you a whole bunch of address
space, but you try to avoid having all of it be backed by physical
memory. If you later want to increase shared_buffers, you then get the
OS to back more of it by physical memory, and if you later want to
decrease shared_buffers, you hopefully have some way of giving the OS
the memory back. As compared with the previous two approaches, this
seems less likely to be noticeable to most PG code. Problems include
(1) you have to somehow figure out how much address space to reserve,
and that forms an upper bound on how big shared_buffers can grow at
runtime and (2) you have to figure out ways to reserve address space
and back more or less of it with physical memory that will work on all
of the platforms that we currently support or might want to support in
the future.
FTR I'm aware of a working experimental prototype along these lines,
that will be presented in Vancouver:

https://www.pgevents.ca/events/pgconfdev2024/sessions/session/31-enhancing-postgresql-plasticity-new-frontiers-in-memory-management/

If you are interested - this is my attempt to implement resizable shared buffers based on ballooning:

https://github.com/knizhnik/postgres/pull/2

Unused memory is returned to OS using `madvise` (so it is not so portable solution).

Unfortunately there are really many data structure in Postgres which size depends on number of buffers.
In my PR I am using `GetAvailableBuffers()` function instead of `NBuffers`. But it doesn't always help because many of this data structures can not be reallocated.

Another important limitation of this approach are:

1. It is necessary to specify maximal number of shared buffers 2. Only `BufferBlocks` space is shrinked but not buffer descriptors and buffer hash. Estimated memory fooyprint for one page is 132 bytes. If we want to scale shared buffers from 100Mb to 100Gb, size of use memory will be 1.6Gb. And it is quite large.
3. Our CLOCK algorithm becomes very inefficient for large number of shared buffers.

Below are first results (pgbench database with scale 100, pgbench -c 32 -j 4 -T 100 -P1 -M prepared -S ) I get:

| shared_buffers    |            available_buffers | TPS  |
| ------------------| ---------------------------- | ---- |
|          128MB    |                           -1 | 280k |
|            1GB    |                           -1 | 324k |
|            2GB    |                           -1 | 358k |
|           32GB    |                           -1 | 350k |
|            2GB    |                        128Mb | 130k |
|            2GB    |                          1Gb | 311k |
|           32GB    |                        128Mb |  13k |
|           32GB    |                          1Gb | 140k |
|           32GB    |                          2Gb | 348k |

`shared_buffers` specifies maximal shared buffers size and `avaiable_buffer` - current limit.

So when shared_buffers >> available_buffers and dataset doesn't fit in them, we get awful degrade of performance (> 20 times).
Thanks to CLOCK algorithm.
My first thought is to replace clock with LRU based in double-linked list. As far as there is no lockless double-list implementation,
it  need some global lock. This lock can become bottleneck. The standard solution is partitioning: use N  LRU lists instead of 1.
Just as partitioned has table used by buffer manager to lockup buffers. Actually we can use the same partitions locks to protect LRU list.
But it not clear what to do with ring buffers (strategies).So I decided not to perform such revolution in bufmgr, but optimize clock to more efficiently split reserved buffers.
Just add skip_count field to buffer descriptor. And it helps! Now the worst case shared_buffer/available_buffers = 32Gb/128Mb
shows the same performance 280k as  shared_buffers=128Mb without ballooning.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: Removing unneeded self joins
Следующее
От: Matthias van de Meent
Дата:
Сообщение: Re: PGC_SIGHUP shared_buffers?