Обсуждение: backend for database 'A' crashes/is killed -> corrupt index in database 'B'
Please bear with me, as it's taken me a while to be able to reproduce an
issue that is worrisome.
Before I describe the problem, I am using postgresql version 8.4.5 on
CentOS 5.5 (x86_64).
The short version is that if a postgresql backend is killed (by the Linux
OOM handler, or kill -9, etc...) while operations are
taking place in a *different* backend, corruption is introduced in the other
backend. I don't want to say it happens 100% of the time, but it happens
every time I test.
The differences from a stock pg config are primarily to do with
logging and such. The other settings (besides logging
and such) that differ from the stock values include:
wal_sync_method -- I've tried with leaving it as the default, and I've
also tried datasync -- same results.
wal_buffers = 128MB
effective_cache_size = 4GB
work_mem = 1MB
maintenance_work_mem = 1GB
Here is how I am reproducing the problem:
1. Open a psql connection to database A. It may remain idle.
2. Wait for an automated process to connect to database B and start
operations. These operations
3. kill -9 the backend for the psql connection to database A.
Then I observe the backends all shutting down and postgresql entering
recovery mode, which succeeds.
Subsequent operations on other databases appear fine, but not for
database B: An index on one of the tables in database B is corrupted.
It is always the
same index.
2011-03-30 14:51:32 UTC LOG: server process (PID 3871) was terminated by
signal 9: Killed
2011-03-30 14:51:32 UTC LOG: terminating any other active server
processes
2011-03-30 14:51:32 UTC WARNING: terminating connection because of crash
of another server process
2011-03-30 14:51:32 UTC DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2011-03-30 14:51:32 UTC HINT: In a moment you should be able to reconnect
to the database and repeat your command.
2011-03-30 14:51:32 UTC databaseB databaseB WARNING: terminating connection
because of crash of another server process
2011-03-30 14:51:32 UTC databaseB databaseB DETAIL: The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted
shared memory.
2011-03-30 14:51:32 UTC databaseB databaseB HINT: In a moment you should be
able to reconnect to the database and repeat your command.
2011-03-30 14:51:32 UTC LOG: all server processes terminated;
reinitializing
2011-03-30 14:51:32 UTC LOG: database system was interrupted; last known
up at 2011-03-30 14:46:50 UTC
2011-03-30 14:51:32 UTC databaseB databaseB FATAL: the database system is
in recovery mode
2011-03-30 14:51:32 UTC LOG: database system was not properly shut down;
automatic recovery in progress
2011-03-30 14:51:32 UTC LOG: redo starts at 301/1D328E40
2011-03-30 14:51:33 UTC databaseB databaseB FATAL: the database system is
in recovery mode
2011-03-30 14:51:34 UTC LOG: record with zero length at 301/1EA08608
2011-03-30 14:51:34 UTC LOG: redo done at 301/1EA08558
2011-03-30 14:51:34 UTC LOG: last completed transaction was at log time
2011-03-30 14:51:31.257997+00
2011-03-30 14:51:37 UTC LOG: autovacuum launcher started
2011-03-30 14:51:37 UTC LOG: database system is ready to accept
connections
2011-03-30 14:52:05 UTC databaseB databaseB ERROR: index "<elided>"
contains unexpected zero page at block 0
2011-03-30 14:52:05 UTC databaseB databaseB HINT: Please REINDEX it.
What's more, I can execute a 'DELETE from tableB' (where tableB is the
table that is the one with the troublesome index) without error, but
when I try to *insert* that is when I get a problem. The index is a
standard btree index. The DELETE statement has no where clause.
Other information: The following variables remain commented out and thus
retain their defaults (confirmed with 'show all'):
fsync is on. synchronous_commit is on. full_page_writes is on.
zero_damaged_pages is *off*.
The underlying storage is a software raid1.
'test_fsync' from postgresql 9.0.1 reports:
Loops = 10000
Simple write:
8k write 157205.515/second
Compare file sync methods using one write:
(open_datasync unavailable)
open_sync 8k write 1357.928/second
8k write, fdatasync 3872.508/second
8k write, fsync 3496.659/second
Compare file sync methods using two writes:
(open_datasync unavailable)
2 open_sync 8k writes 1928.987/second
8k write, 8k write, fdatasync 2020.884/second
8k write, 8k write, fsync 2411.653/second
Compare open_sync with different sizes:
open_sync 16k write 3298.716/second
2 open_sync 8k writes 1228.599/second
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 2535.765/second
8k write, close, fsync 2162.060/second
My interpretation of these values is that the drives themselves have
their write caches disabled.
Lastly, the problem has been reproduced on at least three totally
different machines.
--
Jon
Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'
От
Heikki Linnakangas
Дата:
On 30.03.2011 21:06, Jon Nelson wrote: > The short version is that if a postgresql backend is killed (by the Linux > OOM handler, or kill -9, etc...) while operations are > taking place in a *different* backend, corruption is introduced in the other > backend. I don't want to say it happens 100% of the time, but it happens > every time I test. >... > > Here is how I am reproducing the problem: > > 1. Open a psql connection to database A. It may remain idle. > 2. Wait for an automated process to connect to database B and start > operations. These operations > 3. kill -9 the backend for the psql connection to database A. > > Then I observe the backends all shutting down and postgresql entering > recovery mode, which succeeds. > Subsequent operations on other databases appear fine, but not for > database B: An index on one of the tables in database B is corrupted. > It is always the > same index. > > 2011-03-30 14:51:32 UTC LOG: server process (PID 3871) was terminated by > signal 9: Killed > 2011-03-30 14:51:32 UTC LOG: terminating any other active server > processes > 2011-03-30 14:51:32 UTC WARNING: terminating connection because of crash > of another server process > 2011-03-30 14:51:32 UTC DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another > server process exited abnormally and possibly corrupted shared memory. > 2011-03-30 14:51:32 UTC HINT: In a moment you should be able to reconnect > to the database and repeat your command. > 2011-03-30 14:51:32 UTC databaseB databaseB WARNING: terminating connection > because of crash of another server process > 2011-03-30 14:51:32 UTC databaseB databaseB DETAIL: The postmaster has > commanded this server process to roll back the current transaction and exit, > because another server process exited abnormally and possibly corrupted > shared memory. > 2011-03-30 14:51:32 UTC databaseB databaseB HINT: In a moment you should be > able to reconnect to the database and repeat your command. > 2011-03-30 14:51:32 UTC LOG: all server processes terminated; > reinitializing > 2011-03-30 14:51:32 UTC LOG: database system was interrupted; last known > up at 2011-03-30 14:46:50 UTC > 2011-03-30 14:51:32 UTC databaseB databaseB FATAL: the database system is > in recovery mode > 2011-03-30 14:51:32 UTC LOG: database system was not properly shut down; > automatic recovery in progress > 2011-03-30 14:51:32 UTC LOG: redo starts at 301/1D328E40 > 2011-03-30 14:51:33 UTC databaseB databaseB FATAL: the database system is > in recovery mode > 2011-03-30 14:51:34 UTC LOG: record with zero length at 301/1EA08608 > 2011-03-30 14:51:34 UTC LOG: redo done at 301/1EA08558 > 2011-03-30 14:51:34 UTC LOG: last completed transaction was at log time > 2011-03-30 14:51:31.257997+00 > 2011-03-30 14:51:37 UTC LOG: autovacuum launcher started > 2011-03-30 14:51:37 UTC LOG: database system is ready to accept > connections > 2011-03-30 14:52:05 UTC databaseB databaseB ERROR: index "<elided>" > contains unexpected zero page at block 0 > 2011-03-30 14:52:05 UTC databaseB databaseB HINT: Please REINDEX it. > > What's more, I can execute a 'DELETE from tableB' (where tableB is the > table that is the one with the troublesome index) without error, but > when I try to *insert* that is when I get a problem. The index is a > standard btree index. The DELETE statement has no where clause. Can you provide a self-contained test script to reproduce this? Is the corruption always the same, ie. "unexpected zero page at block 0" ? > My interpretation of these values is that the drives themselves have > their write caches disabled. Ok. It doesn't look like a hardware issue, as there's no OS crash involved. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Mar 31, 2011 at 2:58 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 30.03.2011 21:06, Jon Nelson wrote: >> >> The short version is that if a postgresql backend is killed (by the Linux >> OOM handler, or kill -9, etc...) while operations are >> taking place in a *different* backend, corruption is introduced in the >> other >> backend. =C2=A0I don't want to say it happens 100% of the time, but it h= appens >> every time I test. >> ... >> >> Here is how I am reproducing the problem: >> >> 1. Open a psql connection to database A. It may remain idle. >> 2. Wait for an automated process to connect to database B and start >> operations. These operations >> 3. kill -9 the backend for the psql connection to database A. >> >> Then I observe the backends all shutting down and postgresql entering >> recovery mode, which succeeds. >> Subsequent operations on other databases appear fine, but not for >> database B: An index on one of the tables in database B is corrupted. >> It is always the >> same index. >> >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0server process (PID 3871) was = terminated >> by >> signal 9: Killed >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0terminating any other active s= erver >> processes >> 2011-03-30 14:51:32 UTC =C2=A0 WARNING: =C2=A0terminating connection bec= ause of >> crash >> of another server process >> 2011-03-30 14:51:32 UTC =C2=A0 DETAIL: =C2=A0The postmaster has commande= d this >> server >> process to roll back the current transaction and exit, because another >> server process exited abnormally and possibly corrupted shared memory. >> 2011-03-30 14:51:32 UTC =C2=A0 HINT: =C2=A0In a moment you should be abl= e to >> reconnect >> to the database and repeat your command. >> 2011-03-30 14:51:32 UTC databaseB databaseB WARNING: =C2=A0terminating >> connection >> because of crash of another server process >> 2011-03-30 14:51:32 UTC databaseB databaseB DETAIL: =C2=A0The postmaster= has >> commanded this server process to roll back the current transaction and >> exit, >> because another server process exited abnormally and possibly corrupted >> shared memory. >> 2011-03-30 14:51:32 UTC databaseB databaseB HINT: =C2=A0In a moment you = should >> be >> able to reconnect to the database and repeat your command. >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0all server processes terminate= d; >> reinitializing >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was interrupte= d; last >> known >> up at 2011-03-30 14:46:50 UTC >> 2011-03-30 14:51:32 UTC databaseB databaseB FATAL: =C2=A0the database sy= stem is >> in recovery mode >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was not proper= ly shut >> down; >> automatic recovery in progress >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0redo starts at 301/1D328E40 >> 2011-03-30 14:51:33 UTC databaseB databaseB FATAL: =C2=A0the database sy= stem is >> in recovery mode >> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0record with zero length at 301= /1EA08608 >> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0redo done at 301/1EA08558 >> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0last completed transaction was= at log time >> 2011-03-30 14:51:31.257997+00 >> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0autovacuum launcher started >> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0database system is ready to ac= cept >> connections >> 2011-03-30 14:52:05 UTC databaseB databaseB ERROR: =C2=A0index "<elided>" >> contains unexpected zero page at block 0 >> 2011-03-30 14:52:05 UTC databaseB databaseB HINT: =C2=A0Please REINDEX i= t. >> >> What's more, I can execute a 'DELETE from tableB' (where tableB is the >> table that is the one with the troublesome index) without error, but >> when I try to *insert* that is when I get a problem. The index is a >> standard btree index. The DELETE statement has no where clause. > > Can you provide a self-contained test script to reproduce this? I will try. > Is the corruption always the same, ie. "unexpected zero page at block 0" ? As far as I can tell, yes! --=20 Jon
On Thu, Mar 31, 2011 at 2:58 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 30.03.2011 21:06, Jon Nelson wrote: >> >> The short version is that if a postgresql backend is killed (by the Linux >> OOM handler, or kill -9, etc...) while operations are >> taking place in a *different* backend, corruption is introduced in the >> other >> backend. =C2=A0I don't want to say it happens 100% of the time, but it h= appens >> every time I test. >> ... >> >> Here is how I am reproducing the problem: >> >> 1. Open a psql connection to database A. It may remain idle. >> 2. Wait for an automated process to connect to database B and start >> operations. These operations >> 3. kill -9 the backend for the psql connection to database A. >> >> Then I observe the backends all shutting down and postgresql entering >> recovery mode, which succeeds. >> Subsequent operations on other databases appear fine, but not for >> database B: An index on one of the tables in database B is corrupted. >> It is always the >> same index. >> >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0server process (PID 3871) was = terminated >> by >> signal 9: Killed >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0terminating any other active s= erver >> processes >> 2011-03-30 14:51:32 UTC =C2=A0 WARNING: =C2=A0terminating connection bec= ause of >> crash >> of another server process >> 2011-03-30 14:51:32 UTC =C2=A0 DETAIL: =C2=A0The postmaster has commande= d this >> server >> process to roll back the current transaction and exit, because another >> server process exited abnormally and possibly corrupted shared memory. >> 2011-03-30 14:51:32 UTC =C2=A0 HINT: =C2=A0In a moment you should be abl= e to >> reconnect >> to the database and repeat your command. >> 2011-03-30 14:51:32 UTC databaseB databaseB WARNING: =C2=A0terminating >> connection >> because of crash of another server process >> 2011-03-30 14:51:32 UTC databaseB databaseB DETAIL: =C2=A0The postmaster= has >> commanded this server process to roll back the current transaction and >> exit, >> because another server process exited abnormally and possibly corrupted >> shared memory. >> 2011-03-30 14:51:32 UTC databaseB databaseB HINT: =C2=A0In a moment you = should >> be >> able to reconnect to the database and repeat your command. >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0all server processes terminate= d; >> reinitializing >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was interrupte= d; last >> known >> up at 2011-03-30 14:46:50 UTC >> 2011-03-30 14:51:32 UTC databaseB databaseB FATAL: =C2=A0the database sy= stem is >> in recovery mode >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was not proper= ly shut >> down; >> automatic recovery in progress >> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0redo starts at 301/1D328E40 >> 2011-03-30 14:51:33 UTC databaseB databaseB FATAL: =C2=A0the database sy= stem is >> in recovery mode >> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0record with zero length at 301= /1EA08608 >> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0redo done at 301/1EA08558 >> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0last completed transaction was= at log time >> 2011-03-30 14:51:31.257997+00 >> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0autovacuum launcher started >> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0database system is ready to ac= cept >> connections >> 2011-03-30 14:52:05 UTC databaseB databaseB ERROR: =C2=A0index "<elided>" >> contains unexpected zero page at block 0 >> 2011-03-30 14:52:05 UTC databaseB databaseB HINT: =C2=A0Please REINDEX i= t. >> >> What's more, I can execute a 'DELETE from tableB' (where tableB is the >> table that is the one with the troublesome index) without error, but >> when I try to *insert* that is when I get a problem. The index is a >> standard btree index. The DELETE statement has no where clause. > > Can you provide a self-contained test script to reproduce this? > > Is the corruption always the same, ie. "unexpected zero page at block 0" ? > >> My interpretation of these values is that the drives themselves have >> their write caches disabled. > > Ok. It doesn't look like a hardware issue, as there's no OS crash involve= d. I wanted to write this quick follow-up, so that anyone searching in the future might have a chance of learning the resolution to this issue. The short version is this: the database that experienced corrupted indices was using HASH-based indices. As we all know, as of this writing, HASH-based indices are _not_ crash-safe. When revised to use btree-indices, the issue does not re-occur. --=20 Jon