Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
От | Priya V |
---|---|
Тема | Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet |
Дата | |
Msg-id | CAFsZ43xFxjSiONwRccXBQXZrPRd+Lh7XAkSVEG1ai165xPcoDA@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
|
Список | pgsql-performance |
We operate a large PostgreSQL fleet (~15,000 databases) on dedicated Linux hosts.
Each host runs multiple PostgreSQL instances (multi-instance setup, not just multiple DBs inside one instance).
Environment:
PostgreSQL Versions: Mix of 13.13 and 15.12 (upgrades in progress to be at 15.12 currently both are actively in use)
OS / Kernel: RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range
RAM: 256 GiB (varies slightly)
Swap: Currently none
Workload: Highly mixed — OLTP-style internal apps with unpredictable query patterns and connection counts
Goal: Uniform, safe memory settings across the fleet to avoid kernel or database instability
We’re reviewing vm.overcommit_*
settings because we’ve seen conflicting guidance:
vm.overcommit_memory = 2
gives predictability but can reject allocations earlyvm.overcommit_memory = 1
is more flexible but risks OOM kills if many backends hit peak memory usage at once
We’re considering:
vm.overcommit_memory = 2
for strict accountingIncreasing
vm.overcommit_ratio
from 50 → 80 or 90 to better reflect actual PostgreSQL usage (e.g.,work_mem
reservations that aren’t fully used)
Our questions for those running large PostgreSQL fleets:
What
overcommit_ratio
do you find safe for PostgreSQL without causing kernel memory crunches?Do you prefer
overcommit_memory = 1
or= 2
for production stability?How much swap (if any) do you keep in large-memory servers where PostgreSQL is the primary workload? Is having swap configured a good idea or not ?
Any real-world cases where kernel accounting was too strict or too loose for PostgreSQL?
- What settings to go with if we are not planning on using swap ?
We’d like to avoid both extremes:
Too low a ratio → PostgreSQL backends failing allocations even with free RAM
Too high a ratio → OOM killer terminating PostgreSQL under load spikes
Any operational experiences, tuning recommendations, or kernel/PG interaction pitfalls would be very helpful.
TIA
В списке pgsql-performance по дате отправления: