Обсуждение: postgres crashes on insert in 40 different threads
Hey folks,
I have postgres server running on ubuntu 12, Intel Xeon 8 CPUs 29 GB RAM.
With following settings:
max_connections = 550
shared_buffers = 12GB
temp_buffers = 8MB
max_prepared_transactions = 0
work_mem = 50MB
maintenance_work_mem = 1GB
fsync = on
wal_buffers = 16MB
commit_delay = 50
commit_siblings = 7
checkpoint_segments = 32
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
autovacuum = on
autovacuum_vacuum_threshold = 1800
autovacuum_analyze_threshold = 900
I am doing a lot of writes to DB in 40 different threads – so every thread check if record exists – if not => insert record, if exists => update record.
During this update, my disk IO almost always – 100% and sometimes it crash my DB with following message:
2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was terminated by signal 9: Killed
2013-08-19 03:18:00 UTC LOG: terminating any other active server processes
2013-08-19 03:18:00 UTC WARNING: terminating connection because of crash of another server process
2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2013-08-19 03:18:00 UTC HINT: In a moment you should be able to reconnect to the database and repeat your command.
2013-08-19 03:18:00 UTC WARNING: terminating connection because of crash of another server process
2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2013-08-19 03:18:00 UTC HINT: In a moment you should be able to reconnect to the database and repeat your command.
2013-08-19 03:18:00 UTC WARNING: terminating connection because of crash of another server process
2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
My DB size is not very big – 169GB.
Anyone know how can I get rid of DB crash ?
Thanks,
Dzmitry
Le 19/08/2013 10:07, Dzmitry a écrit : > Hey folks, > I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB RAM. > With following settings: > max_connections = 550 > shared_buffers = 12GB > temp_buffers = 8MB > max_prepared_transactions = 0 > work_mem = 50MB > maintenance_work_mem = 1GB > fsync = on > wal_buffers = 16MB > commit_delay = 50 > commit_siblings = 7 > checkpoint_segments = 32 > checkpoint_completion_target = 0.9 > effective_cache_size = 22GB > autovacuum = on > autovacuum_vacuum_threshold = 1800 > autovacuum_analyze_threshold = 900 > > I am doing a lot of writes to DB in 40 different threads – so every thread > check if record exists – if not => insert record, if exists => update record. > During this update, my disk IO almost always – 100% and sometimes it crash my > DB with following message: > > 2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was terminated > by signal 9: Killed > 2013-08-19 03:18:00 UTC LOG: terminating any other active server processes > 2013-08-19 03:18:00 UTC WARNING: terminating connection because of crash of > another server process > 2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another server > process exited abnormally and possibly corrupted shared memory. > 2013-08-19 03:18:00 UTC HINT: In a moment you should be able to reconnect to > the database and repeat your command. > 2013-08-19 03:18:00 UTC WARNING: terminating connection because of crash of > another server process > 2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another server > process exited abnormally and possibly corrupted shared memory. > 2013-08-19 03:18:00 UTC HINT: In a moment you should be able to reconnect to > the database and repeat your command. > 2013-08-19 03:18:00 UTC WARNING: terminating connection because of crash of > another server process > 2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another server > process exited abnormally and possibly corrupted shared memory. > > My DB size is not very big – 169GB. > > Anyone know how can I get rid of DB crash ? > > > Thanks, > Dzmitry > The fact that the checkpointer was killed -9 let me think the OOMKiller has detected you were out of memory. Could that be the case? 12GB of shared_buffers on a 29Gb box is too high. You should try to lower that value to 6GB, for instance. 550*50MB, that is 27GB of RAM that PostgreSQL could try to adress. I can imagine your system is swapping a lot, and you exhaust swap memory before crash. Regards, -- Stéphane Schildknecht Loxodata - Conseil, expertise et formations
I don't think it's the case. I am using newrelic for monitoring my DB servers(I have one master and 2 slaves - all use the same configuration) - memory is not going above 12.5GB, so I have a good reserve, also I don't see any swapping there :( Thanks, Dzmitry On 8/19/13 11:36 AM, "Stéphane Schildknecht" <stephane.schildknecht@postgresql.fr> wrote: >Le 19/08/2013 10:07, Dzmitry a écrit : >> Hey folks, >> I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB >>RAM. >> With following settings: >> max_connections = 550 >> shared_buffers = 12GB >> temp_buffers = 8MB >> max_prepared_transactions = 0 >> work_mem = 50MB >> maintenance_work_mem = 1GB >> fsync = on >> wal_buffers = 16MB >> commit_delay = 50 >> commit_siblings = 7 >> checkpoint_segments = 32 >> checkpoint_completion_target = 0.9 >> effective_cache_size = 22GB >> autovacuum = on >> autovacuum_vacuum_threshold = 1800 >> autovacuum_analyze_threshold = 900 >> >> I am doing a lot of writes to DB in 40 different threads so every >>thread >> check if record exists if not => insert record, if exists => update >>record. >> During this update, my disk IO almost always 100% and sometimes it >>crash my >> DB with following message: >> >> 2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was >>terminated >> by signal 9: Killed >> 2013-08-19 03:18:00 UTC LOG: terminating any other active server >>processes >> 2013-08-19 03:18:00 UTC WARNING: terminating connection because of >>crash of >> another server process >> 2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this >>server >> process to roll back the current transaction and exit, because another >>server >> process exited abnormally and possibly corrupted shared memory. >> 2013-08-19 03:18:00 UTC HINT: In a moment you should be able to >>reconnect to >> the database and repeat your command. >> 2013-08-19 03:18:00 UTC WARNING: terminating connection because of >>crash of >> another server process >> 2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this >>server >> process to roll back the current transaction and exit, because another >>server >> process exited abnormally and possibly corrupted shared memory. >> 2013-08-19 03:18:00 UTC HINT: In a moment you should be able to >>reconnect to >> the database and repeat your command. >> 2013-08-19 03:18:00 UTC WARNING: terminating connection because of >>crash of >> another server process >> 2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this >>server >> process to roll back the current transaction and exit, because another >>server >> process exited abnormally and possibly corrupted shared memory. >> >> My DB size is not very big 169GB. >> >> Anyone know how can I get rid of DB crash ? >> >> >> Thanks, >> Dzmitry >> > >The fact that the checkpointer was killed -9 let me think the OOMKiller >has >detected you were out of memory. > >Could that be the case? > >12GB of shared_buffers on a 29Gb box is too high. You should try to lower >that >value to 6GB, for instance. >550*50MB, that is 27GB of RAM that PostgreSQL could try to adress. > >I can imagine your system is swapping a lot, and you exhaust swap memory >before >crash. > >Regards, > >-- >Stéphane Schildknecht >Loxodata - Conseil, expertise et formations > > > >-- >Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-admin
Dzmitry wrote: > On 8/19/13 11:36 AM, "Stéphane Schildknecht" <stephane.schildknecht@postgresql.fr> wrote: >> Le 19/08/2013 10:07, Dzmitry a écrit : >>> I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB RAM. >>> With following settings: >>> max_connections = 550 >>> shared_buffers = 12GB >>> temp_buffers = 8MB >>> max_prepared_transactions = 0 >>> work_mem = 50MB >>> maintenance_work_mem = 1GB >>> fsync = on >>> wal_buffers = 16MB >>> commit_delay = 50 >>> commit_siblings = 7 >>> checkpoint_segments = 32 >>> checkpoint_completion_target = 0.9 >>> effective_cache_size = 22GB >>> autovacuum = on >>> autovacuum_vacuum_threshold = 1800 >>> autovacuum_analyze_threshold = 900 >>> >>> I am doing a lot of writes to DB in 40 different threads so every thread >>> check if record exists if not => insert record, if exists => update record. >>> During this update, my disk IO almost always 100% and sometimes it crash my >>> DB with following message: >>> >>> 2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was terminated by signal 9: Killed [...] >>> >>> My DB size is not very big 169GB. >>> >>> Anyone know how can I get rid of DB crash ? >> The fact that the checkpointer was killed -9 let me think the OOMKiller has >> detected you were out of memory. >> >> Could that be the case? >> >> 12GB of shared_buffers on a 29Gb box is too high. You should try to lower that >> value to 6GB, for instance. >> 550*50MB, that is 27GB of RAM that PostgreSQL could try to adress. >> >> I can imagine your system is swapping a lot, and you exhaust swap memory before crash. > I don't think it's the case. I am using newrelic for monitoring my DB > servers(I have one master and 2 slaves - all use the same configuration) - > memory is not going above 12.5GB, so I have a good reserve, also I don't > see any swapping there :( You can check by examining /var/log/messages to see if the OOM killer is at fault, which is highly likely. The OOM killer uses heuristics, so it does the wrong thing occasionally. The documentation is helpful: http://www.postgresql.org/docs/9.2/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT Yours, Laurenz Albe
Do you mean postgres log file(in postgres.conf) log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_min_messages = warning Or /var/log/messages ? Because I haven't this file :( Thanks, Dzmitry On 8/19/13 12:26 PM, "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: >Dzmitry wrote: >> On 8/19/13 11:36 AM, "Stéphane Schildknecht" >><stephane.schildknecht@postgresql.fr> wrote: >>> Le 19/08/2013 10:07, Dzmitry a écrit : >>>> I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB >>>>RAM. >>>> With following settings: >>>> max_connections = 550 >>>> shared_buffers = 12GB >>>> temp_buffers = 8MB >>>> max_prepared_transactions = 0 >>>> work_mem = 50MB >>>> maintenance_work_mem = 1GB >>>> fsync = on >>>> wal_buffers = 16MB >>>> commit_delay = 50 >>>> commit_siblings = 7 >>>> checkpoint_segments = 32 >>>> checkpoint_completion_target = 0.9 >>>> effective_cache_size = 22GB >>>> autovacuum = on >>>> autovacuum_vacuum_threshold = 1800 >>>> autovacuum_analyze_threshold = 900 >>>> >>>> I am doing a lot of writes to DB in 40 different threads so every >>>>thread >>>> check if record exists if not => insert record, if exists => update >>>>record. >>>> During this update, my disk IO almost always 100% and sometimes it >>>>crash my >>>> DB with following message: >>>> >>>> 2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was >>>>terminated by signal 9: Killed >[...] >>>> >>>> My DB size is not very big 169GB. >>>> >>>> Anyone know how can I get rid of DB crash ? > >>> The fact that the checkpointer was killed -9 let me think the >>>OOMKiller has >>> detected you were out of memory. >>> >>> Could that be the case? >>> >>> 12GB of shared_buffers on a 29Gb box is too high. You should try to >>>lower that >>> value to 6GB, for instance. >>> 550*50MB, that is 27GB of RAM that PostgreSQL could try to adress. >>> >>> I can imagine your system is swapping a lot, and you exhaust swap >>>memory before crash. > >> I don't think it's the case. I am using newrelic for monitoring my DB >> servers(I have one master and 2 slaves - all use the same >>configuration) - >> memory is not going above 12.5GB, so I have a good reserve, also I don't >> see any swapping there :( > >You can check by examining /var/log/messages to see if the OOM >killer is at fault, which is highly likely. > >The OOM killer uses heuristics, so it does the wrong thing occasionally. > >The documentation is helpful: >http://www.postgresql.org/docs/9.2/static/kernel-resources.html#LINUX-MEMO >RY-OVERCOMMIT > >Yours, >Laurenz Albe
Hi ! Since Maverick Ubuntu developers disabled logging to /var/log/messages by default. You should check /var/log/syslog instead. -- Mael 2013/8/19 Dzmitry <dzmitry.nikitsin@gmail.com>: > Do you mean postgres log file(in postgres.conf) > > log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' > log_min_messages = warning > > Or /var/log/messages ? Because I haven't this file :( > > > Thanks, > Dzmitry > > > > > > On 8/19/13 12:26 PM, "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: > >>Dzmitry wrote: >>> On 8/19/13 11:36 AM, "Stéphane Schildknecht" >>><stephane.schildknecht@postgresql.fr> wrote: >>>> Le 19/08/2013 10:07, Dzmitry a écrit : >>>>> I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB >>>>>RAM. >>>>> With following settings: >>>>> max_connections = 550 >>>>> shared_buffers = 12GB >>>>> temp_buffers = 8MB >>>>> max_prepared_transactions = 0 >>>>> work_mem = 50MB >>>>> maintenance_work_mem = 1GB >>>>> fsync = on >>>>> wal_buffers = 16MB >>>>> commit_delay = 50 >>>>> commit_siblings = 7 >>>>> checkpoint_segments = 32 >>>>> checkpoint_completion_target = 0.9 >>>>> effective_cache_size = 22GB >>>>> autovacuum = on >>>>> autovacuum_vacuum_threshold = 1800 >>>>> autovacuum_analyze_threshold = 900 >>>>> >>>>> I am doing a lot of writes to DB in 40 different threads so every >>>>>thread >>>>> check if record exists if not => insert record, if exists => update >>>>>record. >>>>> During this update, my disk IO almost always 100% and sometimes it >>>>>crash my >>>>> DB with following message: >>>>> >>>>> 2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was >>>>>terminated by signal 9: Killed >>[...] >>>>> >>>>> My DB size is not very big 169GB. >>>>> >>>>> Anyone know how can I get rid of DB crash ? >> >>>> The fact that the checkpointer was killed -9 let me think the >>>>OOMKiller has >>>> detected you were out of memory. >>>> >>>> Could that be the case? >>>> >>>> 12GB of shared_buffers on a 29Gb box is too high. You should try to >>>>lower that >>>> value to 6GB, for instance. >>>> 550*50MB, that is 27GB of RAM that PostgreSQL could try to adress. >>>> >>>> I can imagine your system is swapping a lot, and you exhaust swap >>>>memory before crash. >> >>> I don't think it's the case. I am using newrelic for monitoring my DB >>> servers(I have one master and 2 slaves - all use the same >>>configuration) - >>> memory is not going above 12.5GB, so I have a good reserve, also I don't >>> see any swapping there :( >> >>You can check by examining /var/log/messages to see if the OOM >>killer is at fault, which is highly likely. >> >>The OOM killer uses heuristics, so it does the wrong thing occasionally. >> >>The documentation is helpful: >>http://www.postgresql.org/docs/9.2/static/kernel-resources.html#LINUX-MEMO >>RY-OVERCOMMIT >> >>Yours, >>Laurenz Albe > > > > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin
>> Dzmitry wrote: >>> On 8/19/13 11:36 AM, "Stéphane Schildknecht" >>><stephane.schildknecht@postgresql.fr> wrote: >>>> Le 19/08/2013 10:07, Dzmitry a écrit : >>>>> I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB >>>>>RAM. >>>>> With following settings: >>>>> max_connections = 550 >>>>> shared_buffers = 12GB >>>>> temp_buffers = 8MB >>>>> max_prepared_transactions = 0 >>>>> work_mem = 50MB >>>>> maintenance_work_mem = 1GB >>>>> fsync = on >>>>> wal_buffers = 16MB >>>>> commit_delay = 50 >>>>> commit_siblings = 7 >>>>> checkpoint_segments = 32 >>>>> checkpoint_completion_target = 0.9 >>>>> effective_cache_size = 22GB >>>>> autovacuum = on >>>>> autovacuum_vacuum_threshold = 1800 >>>>> autovacuum_analyze_threshold = 900 >>>>> >>>>> I am doing a lot of writes to DB in 40 different threads so every >>>>>thread >>>>> check if record exists if not => insert record, if exists => update >>>>>record. >>>>> During this update, my disk IO almost always 100% and sometimes it >>>>>crash my >>>>> DB with following message: >>>>> >>>>> 2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was >>>>>terminated by signal 9: Killed >>[...] >>>>> >>>>> My DB size is not very big 169GB. >>>>> >>>>> Anyone know how can I get rid of DB crash ? >> >>>> The fact that the checkpointer was killed -9 let me think the >>>>OOMKiller has >>>> detected you were out of memory. >>>> >>>> Could that be the case? >>>> >>>> 12GB of shared_buffers on a 29Gb box is too high. You should try to >>>>lower that >>>> value to 6GB, for instance. >>>> 550*50MB, that is 27GB of RAM that PostgreSQL could try to adress. >>>> >>>> I can imagine your system is swapping a lot, and you exhaust swap >>>>memory before crash. >> >>> I don't think it's the case. I am using newrelic for monitoring my DB >>> servers(I have one master and 2 slaves - all use the same >>>configuration) - >>> memory is not going above 12.5GB, so I have a good reserve, also I don't >>> see any swapping there :( >> >> You can check by examining /var/log/messages to see if the OOM >> killer is at fault, which is highly likely. >> >> The OOM killer uses heuristics, so it does the wrong thing occasionally. >> >> The documentation is helpful: >> http://www.postgresql.org/docs/9.2/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT > Do you mean postgres log file(in postgres.conf) > > log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' > log_min_messages = warning > > Or /var/log/messages ? Because I haven't this file :( I meant the latter. /var/log/messages is just where syslog output is directed on some Linux distributions. I don't know Ubuntu, so sorry if I got it wrong. Maybe it is /var/log/syslog on Ubuntu. In case of doubt check your syslog configuration. Yours, Laurenz Albe
Thank you guys ! Found in logs(db-slave1 is my replica that use streaming replication): Aug 18 15:49:38 db-slave1 kernel: [25094456.525703] postgres invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Aug 18 15:49:38 db-slave1 kernel: [25094456.525708] postgres cpuset=/ mems_allowed=0 Aug 18 15:49:38 db-slave1 kernel: [25094456.525712] Pid: 26418, comm: postgres Not tainted 3.2.0-40-virtual #64-Ubuntu Aug 18 15:49:48 db-slave1 kernel: [25094456.621937] Out of memory: Kill process 2414 (postgres) score 417 or sacrifice child Aug 18 15:49:48 db-slave1 kernel: [25094456.621949] Killed process 2414 (postgres) total-vm:13057464kB, anon-rss:28560kB, file-rss:12773376kB Aug 19 03:18:00 db-slave1 kernel: [25135758.540539] postgres invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0 Aug 19 03:18:01 db-slave1 kernel: [25135758.540544] postgres cpuset=/ mems_allowed=0 Aug 19 03:18:07 db-slave1 kernel: [25135758.540548] Pid: 23994, comm: postgres Not tainted 3.2.0-40-virtual #64-Ubuntu Aug 19 03:18:07 db-slave1 kernel: [25135758.626405] Out of memory: Kill process 28354 (postgres) score 348 or sacrifice child Aug 19 03:18:07 db-slave1 kernel: [25135758.626418] Killed process 28354 (postgres) total-vm:13021248kB, anon-rss:8704kB, file-rss:10686512kB Aug 19 03:18:07 db-slave1 kernel: [25135763.068736] postgres invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Aug 19 03:18:07 db-slave1 kernel: [25135763.068740] postgres cpuset=/ mems_allowed=0 Aug 19 03:18:07 db-slave1 kernel: [25135763.068743] Pid: 6780, comm: postgres Not tainted 3.2.0-40-virtual #64-Ubuntu Aug 19 03:18:07 db-slave1 kernel: [25135763.150285] Out of memory: Kill process 20322 (postgres) score 301 or sacrifice child Aug 19 03:18:07 db-slave1 kernel: [25135763.150297] Killed process 20322 (postgres) total-vm:13058892kB, anon-rss:47172kB, file-rss:9186604kB So I will do as Stéphane advised - make shared buffers 6GB. Do you know if I need to do anything else - increase shared memory(SHMMAX/SHMMIN) parameters ? Right now I have Shhmax - 13223870464 Shmall - 4194304 Thanks, Dzmitry On 8/19/13 1:05 PM, "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: >>> Dzmitry wrote: >>>> On 8/19/13 11:36 AM, "Stéphane Schildknecht" >>>><stephane.schildknecht@postgresql.fr> wrote: >>>>> Le 19/08/2013 10:07, Dzmitry a écrit : >>>>>> I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB >>>>>>RAM. >>>>>> With following settings: >>>>>> max_connections = 550 >>>>>> shared_buffers = 12GB >>>>>> temp_buffers = 8MB >>>>>> max_prepared_transactions = 0 >>>>>> work_mem = 50MB >>>>>> maintenance_work_mem = 1GB >>>>>> fsync = on >>>>>> wal_buffers = 16MB >>>>>> commit_delay = 50 >>>>>> commit_siblings = 7 >>>>>> checkpoint_segments = 32 >>>>>> checkpoint_completion_target = 0.9 >>>>>> effective_cache_size = 22GB >>>>>> autovacuum = on >>>>>> autovacuum_vacuum_threshold = 1800 >>>>>> autovacuum_analyze_threshold = 900 >>>>>> >>>>>> I am doing a lot of writes to DB in 40 different threads so every >>>>>>thread >>>>>> check if record exists if not => insert record, if exists => >>>>>>update >>>>>>record. >>>>>> During this update, my disk IO almost always 100% and sometimes it >>>>>>crash my >>>>>> DB with following message: >>>>>> >>>>>> 2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was >>>>>>terminated by signal 9: Killed >>>[...] >>>>>> >>>>>> My DB size is not very big 169GB. >>>>>> >>>>>> Anyone know how can I get rid of DB crash ? >>> >>>>> The fact that the checkpointer was killed -9 let me think the >>>>>OOMKiller has >>>>> detected you were out of memory. >>>>> >>>>> Could that be the case? >>>>> >>>>> 12GB of shared_buffers on a 29Gb box is too high. You should try to >>>>>lower that >>>>> value to 6GB, for instance. >>>>> 550*50MB, that is 27GB of RAM that PostgreSQL could try to adress. >>>>> >>>>> I can imagine your system is swapping a lot, and you exhaust swap >>>>>memory before crash. >>> >>>> I don't think it's the case. I am using newrelic for monitoring my DB >>>> servers(I have one master and 2 slaves - all use the same >>>>configuration) - >>>> memory is not going above 12.5GB, so I have a good reserve, also I >>>>don't >>>> see any swapping there :( >>> >>> You can check by examining /var/log/messages to see if the OOM >>> killer is at fault, which is highly likely. >>> >>> The OOM killer uses heuristics, so it does the wrong thing >>>occasionally. >>> >>> The documentation is helpful: >>> >>>http://www.postgresql.org/docs/9.2/static/kernel-resources.html#LINUX-ME >>>MORY-OVERCOMMIT > >> Do you mean postgres log file(in postgres.conf) >> >> log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' >> log_min_messages = warning >> >> Or /var/log/messages ? Because I haven't this file :( > >I meant the latter. >/var/log/messages is just where syslog output is directed on some >Linux distributions. I don't know Ubuntu, so sorry if I got >it wrong. Maybe it is /var/log/syslog on Ubuntu. >In case of doubt check your syslog configuration. > >Yours, >Laurenz Albe
Le 19/08/2013 15:06, Dzmitry a écrit : > Thank you guys ! Found in logs(db-slave1 is my replica that use streaming > replication): > > Aug 18 15:49:38 db-slave1 kernel: [25094456.525703] postgres invoked > oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 > Aug 18 15:49:38 db-slave1 kernel: [25094456.525708] postgres cpuset=/ > mems_allowed=0 > Aug 18 15:49:38 db-slave1 kernel: [25094456.525712] Pid: 26418, comm: > postgres Not tainted 3.2.0-40-virtual #64-Ubuntu > > > Aug 18 15:49:48 db-slave1 kernel: [25094456.621937] Out of memory: Kill > process 2414 (postgres) score 417 or sacrifice child > Aug 18 15:49:48 db-slave1 kernel: [25094456.621949] Killed process 2414 > (postgres) total-vm:13057464kB, anon-rss:28560kB, file-rss:12773376kB > > > > Aug 19 03:18:00 db-slave1 kernel: [25135758.540539] postgres invoked > oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0 > Aug 19 03:18:01 db-slave1 kernel: [25135758.540544] postgres cpuset=/ > mems_allowed=0 > Aug 19 03:18:07 db-slave1 kernel: [25135758.540548] Pid: 23994, comm: > postgres Not tainted 3.2.0-40-virtual #64-Ubuntu > > Aug 19 03:18:07 db-slave1 kernel: [25135758.626405] Out of memory: Kill > process 28354 (postgres) score 348 or sacrifice child > > Aug 19 03:18:07 db-slave1 kernel: [25135758.626418] Killed process 28354 > (postgres) total-vm:13021248kB, anon-rss:8704kB, file-rss:10686512kB > Aug 19 03:18:07 db-slave1 kernel: [25135763.068736] postgres invoked > oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 > Aug 19 03:18:07 db-slave1 kernel: [25135763.068740] postgres cpuset=/ > mems_allowed=0 > Aug 19 03:18:07 db-slave1 kernel: [25135763.068743] Pid: 6780, comm: > postgres Not tainted 3.2.0-40-virtual #64-Ubuntu > > Aug 19 03:18:07 db-slave1 kernel: [25135763.150285] Out of memory: Kill > process 20322 (postgres) score 301 or sacrifice child > Aug 19 03:18:07 db-slave1 kernel: [25135763.150297] Killed process 20322 > (postgres) total-vm:13058892kB, anon-rss:47172kB, file-rss:9186604kB > > So I will do as Stéphane advised - make shared buffers 6GB. Do you know if > I need to do anything else - increase shared memory(SHMMAX/SHMMIN) > parameters ? > > > Right now I have > Shhmax - 13223870464 > Shmall - 4194304 > > > > Thanks, > Dzmitry > > > > > > On 8/19/13 1:05 PM, "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: > >>>> Dzmitry wrote: >>>>> On 8/19/13 11:36 AM, "Stéphane Schildknecht" >>>>> <stephane.schildknecht@postgresql.fr> wrote: >>>>>> Le 19/08/2013 10:07, Dzmitry a écrit : >>>>>>> I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB >>>>>>> RAM. >>>>>>> With following settings: >>>>>>> max_connections = 550 >>>>>>> shared_buffers = 12GB >>>>>>> temp_buffers = 8MB >>>>>>> max_prepared_transactions = 0 >>>>>>> work_mem = 50MB >>>>>>> maintenance_work_mem = 1GB >>>>>>> fsync = on >>>>>>> wal_buffers = 16MB >>>>>>> commit_delay = 50 >>>>>>> commit_siblings = 7 >>>>>>> checkpoint_segments = 32 >>>>>>> checkpoint_completion_target = 0.9 >>>>>>> effective_cache_size = 22GB >>>>>>> autovacuum = on >>>>>>> autovacuum_vacuum_threshold = 1800 >>>>>>> autovacuum_analyze_threshold = 900 >>>>>>> >>>>>>> I am doing a lot of writes to DB in 40 different threads so every >>>>>>> thread >>>>>>> check if record exists if not => insert record, if exists => >>>>>>> update >>>>>>> record. >>>>>>> During this update, my disk IO almost always 100% and sometimes it >>>>>>> crash my >>>>>>> DB with following message: >>>>>>> >>>>>>> 2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was >>>>>>> terminated by signal 9: Killed >>>> [...] >>>>>>> My DB size is not very big 169GB. >>>>>>> >>>>>>> Anyone know how can I get rid of DB crash ? >>>>>> The fact that the checkpointer was killed -9 let me think the >>>>>> OOMKiller has >>>>>> detected you were out of memory. >>>>>> >>>>>> Could that be the case? >>>>>> >>>>>> 12GB of shared_buffers on a 29Gb box is too high. You should try to >>>>>> lower that >>>>>> value to 6GB, for instance. >>>>>> 550*50MB, that is 27GB of RAM that PostgreSQL could try to adress. >>>>>> >>>>>> I can imagine your system is swapping a lot, and you exhaust swap >>>>>> memory before crash. >>>>> I don't think it's the case. I am using newrelic for monitoring my DB >>>>> servers(I have one master and 2 slaves - all use the same >>>>> configuration) - >>>>> memory is not going above 12.5GB, so I have a good reserve, also I >>>>> don't >>>>> see any swapping there :( >>>> You can check by examining /var/log/messages to see if the OOM >>>> killer is at fault, which is highly likely. >>>> >>>> The OOM killer uses heuristics, so it does the wrong thing >>>> occasionally. >>>> >>>> The documentation is helpful: >>>> >>>> http://www.postgresql.org/docs/9.2/static/kernel-resources.html#LINUX-ME >>>> MORY-OVERCOMMIT >>> Do you mean postgres log file(in postgres.conf) >>> >>> log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' >>> log_min_messages = warning >>> >>> Or /var/log/messages ? Because I haven't this file :( >> I meant the latter. >> /var/log/messages is just where syslog output is directed on some >> Linux distributions. I don't know Ubuntu, so sorry if I got >> it wrong. Maybe it is /var/log/syslog on Ubuntu. >> In case of doubt check your syslog configuration. >> >> Yours, >> Laurenz Albe > > > As Laurenz said, you should have a look at documentation. It explains how you can lower the risk OOMKiller kills your PostgreSQL processes. 1. You can set vm.overcommit_memory to 2 in sysctl.conf 2. You can adjust the value of oom_score_adj in startup script to prevent OOMKiller to kill Postmaster 3. You can lower shared_buffers, work_mem and max_connections. Regards, -- Stéphane Schildknecht Loxodata - Conseil, expertise et formations
On Aug 19, 2013, at 7:23 AM, Stéphane Schildknecht <stephane.schildknecht@postgresql.fr> wrote: > As Laurenz said, you should have a look at documentation. > > It explains how you can lower the risk OOMKiller kills your PostgreSQL processes. > 1. You can set vm.overcommit_memory to 2 in sysctl.conf > 2. You can adjust the value of oom_score_adj in startup script to prevent OOMKiller to kill Postmaster > 3. You can lower shared_buffers, work_mem and max_connections. 4. Use pgbouncer, and radically lower the number of pg connections used. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
Am 19.08.2013 10:07, schrieb Dzmitry: > I am doing a lot of writes to DB in 40 different threads – so every > thread check if record exists – if not => insert record, if exists => > update record. > During this update, my disk IO almost always – 100% and sometimes it > crash my DB with following message: Is this really needed to have so many threads and chances for race conditions? Beside the mention RAM thing you box might have to low IO capacities for this kind of trnasactions. What is the load and in detail the iowait saying? Cheers, Frank
I am already using pgpool, is it bad have 550 connections, is it much ? Thank you guys for all your help. Thanks, Dzmitry On 8/19/13 4:43 PM, "Scott Ribe" <scott_ribe@elevated-dev.com> wrote: >On Aug 19, 2013, at 7:23 AM, Stéphane Schildknecht ><stephane.schildknecht@postgresql.fr> wrote: > >> As Laurenz said, you should have a look at documentation. >> >> It explains how you can lower the risk OOMKiller kills your PostgreSQL >>processes. >> 1. You can set vm.overcommit_memory to 2 in sysctl.conf >> 2. You can adjust the value of oom_score_adj in startup script to >>prevent OOMKiller to kill Postmaster >> 3. You can lower shared_buffers, work_mem and max_connections. > >4. Use pgbouncer, and radically lower the number of pg connections used. > >-- >Scott Ribe >scott_ribe@elevated-dev.com >http://www.elevated-dev.com/ >(303) 722-0567 voice > > > > > > >-- >Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-admin
Yes- I need so many threads, every night I need load jobs from xml to DB. I need do it as fast as I can, currently it take 4h to load all of them(around 1000000 jobs). CPU/IO wait/percent is about 25%. Do you know how can I check other params related to IO wait ? Thanks, Dzmitry On 8/19/13 5:59 PM, "Frank Lanitz" <frank@frank.uvena.de> wrote: >Am 19.08.2013 10:07, schrieb Dzmitry: >> I am doing a lot of writes to DB in 40 different threads so every >> thread check if record exists if not => insert record, if exists => >> update record. >> During this update, my disk IO almost always 100% and sometimes it >> crash my DB with following message: > >Is this really needed to have so many threads and chances for race >conditions? > >Beside the mention RAM thing you box might have to low IO capacities for >this kind of trnasactions. What is the load and in detail the iowait >saying? > >Cheers, >Frank > > >-- >Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-admin
On Aug 19, 2013, at 9:07 AM, Dzmitry <dzmitry.nikitsin@gmail.com> wrote: > Yes- I need so many threads... This is not at all clear from your answer. At most, 1 thread per logical core can execute at any one time. And while manycan be waiting on IO instead of executing, there is still a rather low limit to how many threads it takes to saturateIO capacity. If you actually tried to *run* 550 processes at the same time, you'd just be wasting time on contention. So, back to the question I just asked a minute ago in the prior email, do you have any idea how many pg connections are actuallybeing used? -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
Sorry for confusion. I have 2 machines, each machine run 1 process, with 20 threads in each - total 40 threads. Every thread connected to postgres and insert data. I am using technology, that keep connection opens. It means every thread open 1 connection when it starts, and will closed it - only when killed. But threads performing background jobs - it means they running always, so I keep connection always open. Thanks, Dzmitry On 8/19/13 6:17 PM, "Scott Ribe" <scott_ribe@elevated-dev.com> wrote: >On Aug 19, 2013, at 9:07 AM, Dzmitry <dzmitry.nikitsin@gmail.com> wrote: > >> Yes- I need so many threads... > >This is not at all clear from your answer. At most, 1 thread per logical >core can execute at any one time. And while many can be waiting on IO >instead of executing, there is still a rather low limit to how many >threads it takes to saturate IO capacity. If you actually tried to *run* >550 processes at the same time, you'd just be wasting time on contention. > >So, back to the question I just asked a minute ago in the prior email, do >you have any idea how many pg connections are actually being used? > >-- >Scott Ribe >scott_ribe@elevated-dev.com >http://www.elevated-dev.com/ >(303) 722-0567 voice > > > >
On Aug 19, 2013, at 9:21 AM, Dzmitry <dzmitry.nikitsin@gmail.com> wrote: > I have 2 machines, each machine run 1 process, with 20 threads in each - > total 40 threads. Every thread connected to postgres and insert data. I am > using technology, that keep connection opens. It means every thread open 1 > connection when it starts, and will closed it - only when killed. But > threads performing background jobs - it means they running always, so I > keep connection always open. So, 40 threads, not 440? Was 440 connections a typo, or are there 400 connections you aren't telling us about? -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Aug 19, 2013, at 9:22 AM, Dzmitry <dzmitry.nikitsin@gmail.com> wrote: > I am using pgpool to balance load to slave servers. So, to be clear, you're not using pgbouncer after all? -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
No, I am not using pgbouncer, I am using pgpool. Total - I have 440 connections to postgres(I have rails application running on some servers - each application setup 60 connections to DB and keep if forever(until will not be killed), also I have some machines that do background processing, that keep connections too). Part that do a lot of writes(that update jobs from xml feed every night) - have 40 threads and keep 40 connections. Thanks, Dzmitry On 8/19/13 6:26 PM, "Scott Ribe" <scott_ribe@elevated-dev.com> wrote: >On Aug 19, 2013, at 9:22 AM, Dzmitry <dzmitry.nikitsin@gmail.com> wrote: > >> I am using pgpool to balance load to slave servers. > >So, to be clear, you're not using pgbouncer after all? > >-- >Scott Ribe >scott_ribe@elevated-dev.com >http://www.elevated-dev.com/ >(303) 722-0567 voice > > > >
On Aug 19, 2013, at 9:55 AM, Dzmitry <dzmitry.nikitsin@gmail.com> wrote: > No, I am not using pgbouncer, I am using pgpool. > > Total - I have 440 connections to postgres(I have rails application > running on some servers - each application setup 60 connections to DB and > keep if forever(until will not be killed), also I have some machines that > do background processing, that keep connections too). > > Part that do a lot of writes(that update jobs from xml feed every night) - > have 40 threads and keep 40 connections. That's extreme, and probably counter-productive. How many cores do you have on those rails servers? Probably not 64, right? Not 32? 16? 12? 8, even? Assuming <64, what advantagedo you expect from 60 connections? Same comment applies to the 40 connections doing the update jobs--more connectionsthan cores is unlikely to be helping anything, and more connections than 2x cores is almost guaranteed to be worsethan fewer. Postgres connections are of the heavy-weight variety: process per connection, not thread per connection, not thread-per coreevent-driven. In particular, I'd worry about work_mem in your configuration. You've either got to set it really low andlive with queries going to disk too quickly for sorts and so on, or have it a decent size and have the risk that too manyqueries at once will trigger OOM. Given your configuration, I wouldn't even start with pgbouncer for connection pooling. I'd first just slash the number ofconnections everywhere by 1/2, or even 1/4 and see what effect that had. Then as a second step I'd look at where connectionpooling might be used effectively. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
Ok,thank you, it's a good point. Need to review & make fixes with what I have. Thanks, Dzmitry On 8/24/13 6:14 PM, "Scott Ribe" <scott_ribe@elevated-dev.com> wrote: >On Aug 19, 2013, at 9:55 AM, Dzmitry <dzmitry.nikitsin@gmail.com> wrote: > >> No, I am not using pgbouncer, I am using pgpool. >> >> Total - I have 440 connections to postgres(I have rails application >> running on some servers - each application setup 60 connections to DB >>and >> keep if forever(until will not be killed), also I have some machines >>that >> do background processing, that keep connections too). >> >> Part that do a lot of writes(that update jobs from xml feed every >>night) - >> have 40 threads and keep 40 connections. > >That's extreme, and probably counter-productive. > >How many cores do you have on those rails servers? Probably not 64, >right? Not 32? 16? 12? 8, even? Assuming <64, what advantage do you >expect from 60 connections? Same comment applies to the 40 connections >doing the update jobs--more connections than cores is unlikely to be >helping anything, and more connections than 2x cores is almost guaranteed >to be worse than fewer. > >Postgres connections are of the heavy-weight variety: process per >connection, not thread per connection, not thread-per core event-driven. >In particular, I'd worry about work_mem in your configuration. You've >either got to set it really low and live with queries going to disk too >quickly for sorts and so on, or have it a decent size and have the risk >that too many queries at once will trigger OOM. > >Given your configuration, I wouldn't even start with pgbouncer for >connection pooling. I'd first just slash the number of connections >everywhere by 1/2, or even 1/4 and see what effect that had. Then as a >second step I'd look at where connection pooling might be used >effectively. > >-- >Scott Ribe >scott_ribe@elevated-dev.com >http://www.elevated-dev.com/ >(303) 722-0567 voice > > > >
On Aug 25, 2013, at 2:04 AM, Dzmitry <dzmitry.nikitsin@gmail.com> wrote: > Ok,thank you, it's a good point. Need to review & make fixes with what I > have. Yes, as suggested I would try just reducing first. Where connection pooling could help is if your load across all those rails apps is not spread out evenly, but varies overtime such that there is *no* single small number of workers that is right for all conditions. Because the same commentsabout no benefit from tons of threads applies to the pg server as well: so 110 connections simultaneously doing workis not likely to be better than some smaller number... -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice