Обсуждение: One PG process eating more than 40GB of RAM and getting killed by OOM
One PG process eating more than 40GB of RAM and getting killed by OOM
От
Jean-Christophe Boggio
Дата:
Hello, On my dev laptop, I have ~40GB free RAM. When launching a heavy calculation in PostgreSQL (within a stored procedure), it consumes as much memory as is available and then gets killed by OOM. There is only one connected session. I have the following settings, which look reasonable (to me): shared_buffers = 512MB # min 128kB #huge_pages = try # on, off, or try temp_buffers = 512MB # min 800kB #max_prepared_transactions = 0 # zero disables the feature work_mem = 1GB # min 64kB #hash_mem_multiplier = 1.0 # 1-1000.0 multiplier on hash table work_mem maintenance_work_mem = 1GB # min 1MB #autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem #logical_decoding_work_mem = 64MB # min 64kB #max_stack_depth = 2MB # min 100kB #shared_memory_type = mmap # the default is the first option dynamic_shared_memory_type = posix # the default is the first option #temp_file_limit = -1 # limits per-process temp file space This is PostgreSQL 14.7 running on Ubuntu 23.04 What can I do to prevent the crash? Thanks for your help,
Turn off the OOM killer so you would get a nicer me ssage in PG log file instead of crashing the PG service.
Jean-Christophe Boggio wrote on 10/13/2023 9:06 AM:
vm.overcommit_memory=2
Jean-Christophe Boggio wrote on 10/13/2023 9:06 AM:
Hello,
On my dev laptop, I have ~40GB free RAM. When launching a heavy calculation in PostgreSQL (within a stored procedure), it consumes as much memory as is available and then gets killed by OOM. There is only one connected session.
I have the following settings, which look reasonable (to me):
shared_buffers = 512MB # min 128kB
#huge_pages = try # on, off, or try
temp_buffers = 512MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
work_mem = 1GB # min 64kB
#hash_mem_multiplier = 1.0 # 1-1000.0 multiplier on hash table work_mem
maintenance_work_mem = 1GB # min 1MB
#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
#logical_decoding_work_mem = 64MB # min 64kB
#max_stack_depth = 2MB # min 100kB
#shared_memory_type = mmap # the default is the first option
dynamic_shared_memory_type = posix # the default is the first option
#temp_file_limit = -1 # limits per-process temp file space
This is PostgreSQL 14.7 running on Ubuntu 23.04
What can I do to prevent the crash?
Thanks for your help,
Regards,
Michael Vitale
703-600-9343
Вложения
Re: One PG process eating more than 40GB of RAM and getting killed by OOM
От
Johannes Truschnigg
Дата:
You will want to try decreasing work_mem to a sane number first, without looking at anything else really. Check out the official docs: https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM The gist is that work_mem is not a limit that's effective per session/connection/query, but per sort- or hash-node, of which there can be many in complex queries. Which is why 1GB of work_mem can end up consuming several multiples of that, if you are (un)lucky enough. -- with best regards: - Johannes Truschnigg ( johannes@truschnigg.info ) www: https://johannes.truschnigg.info/ phone: +436502133337 xmpp: johannes@truschnigg.info
Вложения
Hi, On Fri, Oct 13, 2023 at 03:06:57PM +0200, Jean-Christophe Boggio wrote: > On my dev laptop, I have ~40GB free RAM. When launching a heavy calculation > in PostgreSQL (within a stored procedure), it consumes as much memory as is > available and then gets killed by OOM. There is only one connected session. > > I have the following settings, which look reasonable (to me): > > shared_buffers = 512MB # min 128kB That's not a lot. > work_mem = 1GB # min 64kB On the other hand, that's a lot. So if that query is run in parallel and has a lot of different things it does, it might use work_mem several times over, resulting in a lot of used memory. So first thing to try would be to lower work_mem to someting much lower like 64MB or 128MB. > This is PostgreSQL 14.7 running on Ubuntu 23.04 The other thing you can try is to see whether turning jit off (or just jit_inline_above_cost = -1) is helping. Michael
Re: One PG process eating more than 40GB of RAM and getting killed by OOM
От
Jean-Christophe Boggio
Дата:
Le 13/10/2023 à 15:12, MichaelDBA a écrit :
Turn off the OOM killer so you would get a nicer me ssage in PG log file instead of crashing the PG service.vm.overcommit_memory=2
Did this and now the process dies much quicker (without seemingly consume all the available memory)... Also I can not launch thunderbird anymore with this setting...
Anyway, I also reduced work_mem to 128Mb
You can find the corresponding logs attached.
Thanks for your help,
JC
Вложения
Jean-Christophe Boggio <postgresql@thefreecat.org> writes: > You can find the corresponding logs attached. Your query seems to be creating TupleSort contexts and then leaking them, which is surely a bug. Can you make a self-contained test case that reproduces this? regards, tom lane
On Fri, Oct 13, 2023 at 10:53 AM Jean-Christophe Boggio <postgresql@thefreecat.org> wrote:
Le 13/10/2023 à 15:12, MichaelDBA a écrit :Turn off the OOM killer so you would get a nicer me ssage in PG log file instead of crashing the PG service.vm.overcommit_memory=2Did this and now the process dies much quicker (without seemingly consume all the available memory)... Also I can not launch thunderbird anymore with this setting...
Yes, turning off overcommit doesn't play with graphical environments, in my experience. But a production database probably shouldn't be running on a system like that. On non-prod systems, you can either turn it off temporarily, or you could try to catch the problem before it becomes fatal and get the log with pg_log_backend_memory_contexts.
Anyway, I also reduced work_mem to 128Mb
You can find the corresponding logs attached.
We can see what the problem is (over 137,000 concurrent tuple sorts), but we can't tell what is ultimately causing it. You will need to dig into, or disclose, the contents of the procedure.
Cheers,
Jeff
Re: One PG process eating more than 40GB of RAM and getting killed by OOM
От
Jean-Christophe Boggio
Дата:
Le 13/10/2023 à 18:48, Jeff Janes a écrit : > Yes, turning off overcommit doesn't play with graphical environments, > in my experience. But a production database probably shouldn't be > running on a system like that. On non-prod systems, you can either > turn it off temporarily, or you could try to catch the problem before > it becomes fatal and get the log with pg_log_backend_memory_contexts. As I said, this is my dev laptop and no, I would never waste precious RAM this way on a production server ;-) > We can see what the problem is (over 137,000 concurrent tuple sorts), > but we can't tell what is ultimately causing it. You will need to dig > into, or disclose, the contents of the procedure. I have no problem disclosing this code and data to the PG dev team (this is client data though so please keep it for yourselves). Where can I send you a link to the dump ? Best, JC
Jean-Christophe Boggio <postgresql@thefreecat.org> writes: > Le 13/10/2023 à 18:48, Jeff Janes a écrit : >> We can see what the problem is (over 137,000 concurrent tuple sorts), >> but we can't tell what is ultimately causing it. You will need to dig >> into, or disclose, the contents of the procedure. > I have no problem disclosing this code and data to the PG dev team (this > is client data though so please keep it for yourselves). Where can I > send you a link to the dump ? I'm interested in taking a look, you can send me the link privately. regards, tom lane
Jean-Christophe Boggio <postgresql@thefreecat.org> writes: > I have no problem disclosing this code and data to the PG dev team (this > is client data though so please keep it for yourselves). Where can I > send you a link to the dump ? Thanks for sending the data. I'm not observing any leak on current Postgres, and after checking the commit log I realized that your symptoms look mighty like this previous report: https://www.postgresql.org/message-id/b2bd02dff61af15e3526293e2771f874cf2a3be7.camel%40cybertec.at which was fixed here: Author: Tomas Vondra <tomas.vondra@postgresql.org> Branch: master [98640f960] 2023-07-02 20:03:30 +0200 Branch: REL_16_STABLE Release: REL_16_0 [9ae7b5d1f] 2023-07-02 20:04:16 +0200 Branch: REL_15_STABLE Release: REL_15_4 [0c5fe4ff6] 2023-07-02 20:04:40 +0200 Branch: REL_14_STABLE Release: REL_14_9 [c1affa38c] 2023-07-02 20:05:14 +0200 Branch: REL_13_STABLE Release: REL_13_12 [3ce761d5c] 2023-07-02 20:05:35 +0200 Fix memory leak in Incremental Sort rescans The Incremental Sort had a couple issues, resulting in leaking memory during rescans, possibly triggering OOM. The code had a couple of related flaws: 1. During rescans, the sort states were reset but then also set to NULL (despite the comment saying otherwise). ExecIncrementalSort then sees NULL and initializes a new sort state, leaking the memory used by the old one. 2. Initializing the sort state also automatically rebuilt the info about presorted keys, leaking the already initialized info. presorted_keys was also unnecessarily reset to NULL. Patch by James Coleman, based on patches by Laurenz Albe and Tom Lane. Backpatch to 13, where Incremental Sort was introduced. Author: James Coleman, Laurenz Albe, Tom Lane Reported-by: Laurenz Albe, Zu-Ming Jiang Backpatch-through: 13 Discussion: https://postgr.es/m/b2bd02dff61af15e3526293e2771f874cf2a3be7.camel%40cybertec.at Discussion: https://postgr.es/m/db03c582-086d-e7cd-d4a1-3bc722f81765%40inf.ethz.ch So I think the answer for you is "update to Postgres 14.9". regards, tom lane
Re: One PG process eating more than 40GB of RAM and getting killed by OOM
От
Jean-Christophe Boggio
Дата:
Hello Tom, Le 14/10/2023 à 00:39, Tom Lane a écrit : > Thanks for sending the data. I'm not observing any leak on current > Postgres, and after checking the commit log I realizedthat your > symptoms look mighty like this previous report: > [...] > So I think the answer for you is "update to Postgres 14.9". You are right. I thought I always used the latest version but updates were deactivated... Sorry for wasting your time, after upgrade, everything is working perfectly. Have a nice day, JC