Обсуждение: Disabling memory overcommit deemed dangerous

Поиск
Список
Период
Сортировка

Disabling memory overcommit deemed dangerous

От
David Geier
Дата:
Hi hackers,

In our documentation we recommend disabling memory overcommit to prevent
the OOM killer from kicking in, see [1]. Accordingly, we expect
PostgreSQL to handle OOM situations gracefully. In my experience there
are unfortunately several severe problems with that approach:

1. PostgreSQL contains code paths that aren't safe against failing
memory allocations. Examples are broken cleanup code, see [2], or
various calls to strdup() where we don't check the return value.

2. On Linux, running OOM during stack expansion triggers SIGSEGV. This
is not a theoretical concern. I hit this case in my tests. We could set
up a custom stack via MAP_STACK | MAP_GROWSDOWN, but in practice that's
very tricky because of ASLR. The only real alternative is committing (=
writing to) all memory on backend startup. Problem with that approach is
that all that memory would count already towards the commit limit. We
might get away with that if we lower the maximum stack size significantly.

3. Other processes running on the same system are mostly not safe
against failing memory allocations. In my tests I ended up multiple
times with a server that I couldn't log in anymore because some related
process had crashed due to running OOM.

I cannot see how someone would today reliably run a PostgreSQL server
with memory overcommit disabled, if it truly runs occasionally OOM. Even
if we fixed (1) and (2) we would still be left with (3). cgroups might
help with (3) but the last time I checked they didn't properly implement
memory overcommit.

My proposal is to remove the part about disabling memory overcommit from
the documentation, or alternatively, describe the pros and cons of both
approaches. Thoughts?

[1]
https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
[2]
https://www.postgresql.org/message-id/flat/b12f9e22-2618-42b8-8644-88bae192c7fd%40gmail.com

--
David Geier



Re: Disabling memory overcommit deemed dangerous

От
Tom Lane
Дата:
David Geier <geidav.pg@gmail.com> writes:
> In our documentation we recommend disabling memory overcommit to prevent
> the OOM killer from kicking in, see [1]. Accordingly, we expect
> PostgreSQL to handle OOM situations gracefully. In my experience there
> are unfortunately several severe problems with that approach:

> 1. PostgreSQL contains code paths that aren't safe against failing
> memory allocations. Examples are broken cleanup code, see [2], or
> various calls to strdup() where we don't check the return value.

If you are aware of such places, please submit patches to fix them,
because they are bugs with or without overcommit.  Overcommit does
*not* prevent the kernel from returning ENOMEM, so this seems like
an extremely specious argument for not telling people to disable
overcommit.

> 2. On Linux, running OOM during stack expansion triggers SIGSEGV.

Again, allowing overcommit is hardly a cure.

> 3. Other processes running on the same system are mostly not safe
> against failing memory allocations.

The overcommit recommendation is only meant for machines that are
more or less dedicated to Postgres, so I'm not sure how much this
matters.  Also, we've seen comparable problems on some platforms
after running the kernel out of file descriptors.  The bottom line
is that you need a reasonable amount of headroom in your system
provisioning.

> I cannot see how someone would today reliably run a PostgreSQL server
> with memory overcommit disabled, if it truly runs occasionally OOM.

We have very substantial field experience showing that leaving memory
overcommit enabled also makes the system unreliable, if it approaches
OOM conditions.  I don't think removing that advice is an improvement.

            regards, tom lane



Re: Disabling memory overcommit deemed dangerous

От
David Geier
Дата:
Hi Tom!

On 02.09.2025 20:10, Tom Lane wrote:
> David Geier <geidav.pg@gmail.com> writes:
> 
> If you are aware of such places, please submit patches to fix them,
> because they are bugs with or without overcommit.  Overcommit does
> *not* prevent the kernel from returning ENOMEM, so this seems like
> an extremely specious argument for not telling people to disable
> overcommit.

Yes, but to the best of my knowledge only for really wild allocation
requests. I haven't come across any ENOMEM in my testing when overcommit
was enabled.

I agree that we want these places fixed regardless. I'll submit a patch
for the strdup() calls but there's a bigger problem here: we don't
really have means to test the changes we make. For example the bug in
[2] requires, according to the discussion, some more involved
refactoring of the cleanup code. How do we make sure these changes are
actually correct?

We could build some infrastructure for OOM testing but it feels like
wasted effort because even if we fixed all the problems of category (1),
we're still not good because of (2) and (3).

> 
>> 2. On Linux, running OOM during stack expansion triggers SIGSEGV.
> 
> Again, allowing overcommit is hardly a cure.

It's not but neither is disallowing overcommit.

> 
>> 3. Other processes running on the same system are mostly not safe
>> against failing memory allocations.
> 
> The overcommit recommendation is only meant for machines that are
> more or less dedicated to Postgres, so I'm not sure how much this
> matters.  Also, we've seen comparable problems on some platforms
> after running the kernel out of file descriptors.  The bottom line
> is that you need a reasonable amount of headroom in your system
> provisioning.

That's rarely the case in a production environment. Typically there are
backups, monitoring, virus scanner, etc. running on the same host which
are usually not resilient against failure (e.g. don't automatically
restart / retry). Same goes for e.g. the login problem mentioned.

Say a DBA runs into an OOM, checks out the documentation and applies the
overcommit change. Now he has a false sense of safety and will be
surprised that suddenly his service got new, unexpected points of failure.

> 
> We have very substantial field experience showing that leaving memory
> overcommit enabled also makes the system unreliable, if it approaches
> OOM conditions.  I don't think removing that advice is an improvement.
Completely agreed. Leaving overcommit enabled is also bad. There's no
safe way of running PostgreSQL in the presence of OOMs. Therefore, it
depends on what's more important: having some chance PostgreSQL stays up
but risking other programs to die, or always have PostgreSQL die but
have the other programs always stay up.

I think it would be good make the tradeoffs both settings have more
explicit in the documentation and stress that actually the most
important is to  configure PostgreSQL such that OOMs are very unlikely
to happen. If you agree I can draft a patch.

--
David Geier