Обсуждение: Diagnosing Postgres Segfault
Early this morning one of our databases suffered a segfault. The only note on this in /var/log/messages was:
Nov 29 03:34:07 servername kernel: postmaster[18265]: segfault at 0000000017b82000 rip 0000003c45e7c266 rsp 00007fff2ca1b7d8 error 4
(replaced actual server name with servername)
The postgres log has a bit more information:
2010-11-29 03:34:07 PST LOG: server process (PID 18265) was terminated by signal 11
2010-11-29 03:34:07 PST LOG: terminating any other active server processes
2010-11-29 03:34:07 PST 10.2.0.18 WARNING: terminating connection because of crash of another server process
2010-11-29 03:34:07 PST 10.2.0.18 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2010-11-29 03:34:07 PST 10.2.0.18 HINT: In a moment you should be able to reconnect to the database and repeat your command.
2010-11-29 03:34:07 PST 10.2.0.18 WARNING: terminating connection because of crash of another server process
2010-11-29 03:34:07 PST 10.2.0.18 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2010-11-29 03:34:07 PST 10.2.0.18 HINT: In a moment you should be able to reconnect to the database and repeat your command.
2010-11-29 03:34:07 PST 10.2.0.18 WARNING: terminating connection because of crash of another server process
2010-11-29 03:34:07 PST 10.2.0.18 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2010-11-29 03:34:07 PST 10.2.0.18 HINT: In a moment you should be able to reconnect to the database and repeat your command.
2010-11-29 03:34:07 PST 10.2.0.18 FATAL: the database system is in recovery mode
2010-11-29 03:34:07 PST LOG: all server processes terminated; reinitializing
2010-11-29 03:34:10 PST 10.2.0.17 FATAL: the database system is starting up
2010-11-29 03:34:10 PST 10.2.0.17 FATAL: the database system is starting up
2010-11-29 03:34:10 PST 127.0.0.1 FATAL: the database system is starting up
2010-11-29 03:34:10 PST 10.2.0.17 FATAL: the database system is starting up
2010-11-29 03:34:10 PST 10.2.0.18 FATAL: the database system is starting up
2010-11-29 03:34:10 PST 10.2.0.18 FATAL: the database system is starting up
2010-11-29 03:34:10 PST LOG: database system was interrupted at 2010-11-29 03:30:26 PST
2010-11-29 03:34:10 PST LOG: checkpoint record is at 41F/723A1648
2010-11-29 03:34:10 PST LOG: redo record is at 41F/723A1648; undo record is at 0/0; shutdown FALSE
2010-11-29 03:34:10 PST LOG: next transaction ID: 0/1144655884; next OID: 630827
2010-11-29 03:34:10 PST LOG: next MultiXactId: 1; next MultiXactOffset: 0
2010-11-29 03:34:10 PST LOG: database system was not properly shut down; automatic recovery in progress
2010-11-29 03:34:10 PST LOG: record with zero length at 41F/723A1698
2010-11-29 03:34:10 PST LOG: redo is not required
2010-11-29 03:34:10 PST 10.2.0.17 FATAL: the database system is starting up
2010-11-29 03:34:10 PST LOG: database system is ready
I’m trying to figure out what might have caused this, partly because I want to replicate the segfault to see if it is the cause of another system randomly freezing on us. Is there any way to figure out exactly what query being run at the time of the crash was doing? Could the xlog files help me in anyway?
Michael Holt <michael@aers.ca> writes: > I�m trying to figure out what might have caused this, partly because I want > to replicate the segfault to see if it is the cause of another system > randomly freezing on us. Is there any way to figure out exactly what query > being run at the time of the crash was doing? Is there a core dump file? If so, a stack trace from that might be informative. regards, tom lane
Doesn't look like it at the moment. I'll see if we can get that setup and go from there. Thanks. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, November 29, 2010 4:10 PM To: Michael Holt Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Diagnosing Postgres Segfault Michael Holt <michael@aers.ca> writes: > I'm trying to figure out what might have caused this, partly because I want > to replicate the segfault to see if it is the cause of another system > randomly freezing on us. Is there any way to figure out exactly what query > being run at the time of the crash was doing? Is there a core dump file? If so, a stack trace from that might be informative. regards, tom lane