Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet

Поиск

Список

Период

Сортировка

От	Priya V
Тема	Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
Дата	5 августа 20:01:19
Msg-id	CAFsZ43xFxjSiONwRccXBQXZrPRd+Lh7XAkSVEG1ai165xPcoDA@mail.gmail.com обсуждение исходный текст
Ответы	Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
Список	pgsql-performance

Дерево обсуждения

Hello Postgres community,

We operate a large PostgreSQL fleet (~15,000 databases) on dedicated Linux hosts.
Each host runs multiple PostgreSQL instances (multi-instance setup, not just multiple DBs inside one instance).

Environment:

PostgreSQL Versions: Mix of 13.13 and 15.12 (upgrades in progress to be at 15.12 currently both are actively in use)
OS / Kernel: RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range
RAM: 256 GiB (varies slightly)
Swap: Currently none
Workload: Highly mixed — OLTP-style internal apps with unpredictable query patterns and connection counts
Goal: Uniform, safe memory settings across the fleet to avoid kernel or database instability

We’re reviewing vm.overcommit_* settings because we’ve seen conflicting guidance:

vm.overcommit_memory = 2 gives predictability but can reject allocations early
vm.overcommit_memory = 1 is more flexible but risks OOM kills if many backends hit peak memory usage at once

We’re considering:

vm.overcommit_memory = 2 for strict accounting
Increasing vm.overcommit_ratio from 50 → 80 or 90 to better reflect actual PostgreSQL usage (e.g., work_mem reservations that aren’t fully used)

Our questions for those running large PostgreSQL fleets:

What overcommit_ratio do you find safe for PostgreSQL without causing kernel memory crunches?
Do you prefer overcommit_memory = 1 or = 2 for production stability?
How much swap (if any) do you keep in large-memory servers where PostgreSQL is the primary workload? Is having swap configured a good idea or not ?
Any real-world cases where kernel accounting was too strict or too loose for PostgreSQL?
What settings to go with if we are not planning on using swap ?

We’d like to avoid both extremes:

Too low a ratio → PostgreSQL backends failing allocations even with free RAM
Too high a ratio → OOM killer terminating PostgreSQL under load spikes

Any operational experiences, tuning recommendations, or kernel/PG interaction pitfalls would be very helpful.

TIA

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet