Обсуждение: slave restarts with kill -9 coming from somewhere, or nowhere
Hello,
I'm running the latest postgres version (9.2.3), and today for the first time I encountered this:
12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was terminated by signal 9: Killed
12774 2013-04-02 18:13:10 CEST DETAIL: Failed process was running: BEGIN;declare "SQL_CUR0xff25e80" cursor for select distinct .... as "Reservation_date___time" , "C_4F_TRANSACTION"."FTRA_PRICE_VAL
12774 2013-04-02 18:13:10 CEST LOG: terminating any other active server processes
12774 2013-04-02 18:13:12 CEST LOG: all server processes terminated; reinitializing
29113 2013-04-02 18:13:15 CEST LOG: database system was interrupted while in recovery at log time 2013-04-02 18:02:21 CEST
29113 2013-04-02 18:13:15 CEST HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
29113 2013-04-02 18:13:15 CEST LOG: entering standby mode
29113 2013-04-02 18:13:15 CEST LOG: redo starts at 6B0/DD0928A0
29113 2013-04-02 18:13:22 CEST LOG: consistent recovery state reached at 6B0/DE3831E8
12774 2013-04-02 18:13:22 CEST LOG: database system is ready to accept read only connections
29113 2013-04-02 18:13:22 CEST LOG: invalid record length at 6B0/DE3859B8
29117 2013-04-02 18:13:22 CEST LOG: streaming replication successfully connected to primary
I'm running the latest postgres version (9.2.3), and today for the first time I encountered this:
12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was terminated by signal 9: Killed
12774 2013-04-02 18:13:10 CEST DETAIL: Failed process was running: BEGIN;declare "SQL_CUR0xff25e80" cursor for select distinct .... as "Reservation_date___time" , "C_4F_TRANSACTION"."FTRA_PRICE_VAL
12774 2013-04-02 18:13:10 CEST LOG: terminating any other active server processes
12774 2013-04-02 18:13:12 CEST LOG: all server processes terminated; reinitializing
29113 2013-04-02 18:13:15 CEST LOG: database system was interrupted while in recovery at log time 2013-04-02 18:02:21 CEST
29113 2013-04-02 18:13:15 CEST HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
29113 2013-04-02 18:13:15 CEST LOG: entering standby mode
29113 2013-04-02 18:13:15 CEST LOG: redo starts at 6B0/DD0928A0
29113 2013-04-02 18:13:22 CEST LOG: consistent recovery state reached at 6B0/DE3831E8
12774 2013-04-02 18:13:22 CEST LOG: database system is ready to accept read only connections
29113 2013-04-02 18:13:22 CEST LOG: invalid record length at 6B0/DE3859B8
29117 2013-04-02 18:13:22 CEST LOG: streaming replication successfully connected to primary
for as far as I know it happened twice today. I have no idea where these kills are coming from. I only know thse are not nice :)
Does anyone has an idea what happened exactly?
wkr,
wkr,
Bert
--
Bert Desmet
0477/305361
Bert Desmet
0477/305361
Bert <biertie@gmail.com> writes: > I'm running the latest postgres version (9.2.3), and today for the first > time I encountered this: > 12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was > terminated by signal 9: Killed AFAIK there are only two possible sources of signal 9: a manual kill, or the Linux kernel's OOM killer. If it's the latter there should be a concurrent entry in the kernel logfiles about this. If you find one, suggest reading up on how to disable OOM kills, or at least reconfigure your system to make them less probable. regards, tom lane
Hi Tom,
thanks for the tip! it was indeed the oom killer.
Is it wise to disable the oom killer? Or will the server really go down withough postgres doing something about it? thanks for the tip! it was indeed the oom killer.
Bert
On Tue, Apr 2, 2013 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bert <biertie@gmail.com> writes:AFAIK there are only two possible sources of signal 9: a manual kill,
> I'm running the latest postgres version (9.2.3), and today for the first
> time I encountered this:
> 12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was
> terminated by signal 9: Killed
or the Linux kernel's OOM killer. If it's the latter there should be
a concurrent entry in the kernel logfiles about this. If you find one,
suggest reading up on how to disable OOM kills, or at least reconfigure
your system to make them less probable.
regards, tom lane
--
Bert Desmet
0477/305361
Hi all,
I have turned vm.overcommit_memory on 1.
It's a pretty much dedicated machine anyway, except for some postgres maintainance scripts I run in python / bash from the server. I have turned vm.overcommit_memory on 1.
cheers,
Bert
On Wed, Apr 3, 2013 at 8:45 AM, Bert <biertie@gmail.com> wrote:
cheers,currently I already lowered the shared_memory value a bit..Hi Tom,Is it wise to disable the oom killer? Or will the server really go down withough postgres doing something about it?
thanks for the tip! it was indeed the oom killer.
BertOn Tue, Apr 2, 2013 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Bert <biertie@gmail.com> writes:AFAIK there are only two possible sources of signal 9: a manual kill,
> I'm running the latest postgres version (9.2.3), and today for the first
> time I encountered this:
> 12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was
> terminated by signal 9: Killed
or the Linux kernel's OOM killer. If it's the latter there should be
a concurrent entry in the kernel logfiles about this. If you find one,
suggest reading up on how to disable OOM kills, or at least reconfigure
your system to make them less probable.
regards, tom lane
--
Bert Desmet
0477/305361
--
Bert Desmet
0477/305361
hi,
this is strange: one connection almost killed the server. So not a combination of a lot of connections. I saw one connection grewing till over 100GB. Then I cancelled the connection before the oom killer became active again.
These are my memory settings: this is strange: one connection almost killed the server. So not a combination of a lot of connections. I saw one connection grewing till over 100GB. Then I cancelled the connection before the oom killer became active again.
shared_buffers = 20GB
temp_buffers = 1GB
max_prepared_transactions = 10
work_mem = 4GB
maintenance_work_mem = 1GB
max_stack_depth = 8MB
wal_buffers = 32MB
effective_cache_size = 88GB
The server has 128GB ram
cheers,
On Wed, Apr 3, 2013 at 10:10 AM, Bert <biertie@gmail.com> wrote:
We'll see what it gives.Hi all,It's a pretty much dedicated machine anyway, except for some postgres maintainance scripts I run in python / bash from the server.
I have turned vm.overcommit_memory on 1.
cheers,
Bert--On Wed, Apr 3, 2013 at 8:45 AM, Bert <biertie@gmail.com> wrote:cheers,currently I already lowered the shared_memory value a bit..Hi Tom,Is it wise to disable the oom killer? Or will the server really go down withough postgres doing something about it?
thanks for the tip! it was indeed the oom killer.
BertOn Tue, Apr 2, 2013 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Bert <biertie@gmail.com> writes:AFAIK there are only two possible sources of signal 9: a manual kill,
> I'm running the latest postgres version (9.2.3), and today for the first
> time I encountered this:
> 12774 2013-04-02 18:13:10 CEST LOG: server process (PID 28463) was
> terminated by signal 9: Killed
or the Linux kernel's OOM killer. If it's the latter there should be
a concurrent entry in the kernel logfiles about this. If you find one,
suggest reading up on how to disable OOM kills, or at least reconfigure
your system to make them less probable.
regards, tom lane
--
Bert Desmet
0477/305361
Bert Desmet
0477/305361
--
Bert Desmet
0477/305361
Bert <biertie@gmail.com> writes: > These are my memory settings: > work_mem = 4GB > How is it possible that one connection (query) uses all the ram? And how > can I avoid it? Uh ... don't do the above. work_mem is the allowed memory consumption per query step, ie per hash or sort operation. A complex query can easily use multiples of work_mem. regards, tom lane
aha, ok. This was a setting pg_tune sugested. But I can understand how that is a bad idea.
wkr,
Bertwkr,
On Thu, Apr 4, 2013 at 8:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> work_mem = 4GBUh ... don't do the above. work_mem is the allowed memory consumption
> How is it possible that one connection (query) uses all the ram? And how
> can I avoid it?
per query step, ie per hash or sort operation. A complex query can
easily use multiples of work_mem.
regards, tom lane
--
Bert Desmet
0477/305361