Lots of FSM-related fragility in transaction commit
От | Tom Lane |
---|---|
Тема | Lots of FSM-related fragility in transaction commit |
Дата | |
Msg-id | 14768.1323325217@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Lots of FSM-related fragility in transaction commit
|
Список | pgsql-hackers |
Joseph Shraibman reported some rather unpleasant behavior that seems to be due to trying to drop a table whose FSM file has got no permissions: http://archives.postgresql.org/pgsql-general/2011-12/msg00246.php One question is how the file got that way, and whether Postgres did anything wrong to cause it to not have permissions. But what I'm on the warpath about now is the abysmally bad error response. What ought to happen if we fail to delete a file during a DROP TABLE is a WARNING, no more. Not a PANIC that recurs during crash recovery. I tried to reproduce this, and found a different but equally bad behavior: regression=# create table foo as select * from tenk1; SELECT 10000 regression=# delete from foo where unique1%10 = 0; DELETE 1000 regression=# vacuum foo; VACUUM regression=# select 'foo'::regclass::oid; oid -------52860 (1 row) [ go and "chmod 000 52860_fsm" ] regression=# drop table foo; WARNING: AbortTransaction while in COMMIT state ERROR: could not open file "base/27666/52860_fsm": Permission denied PANIC: cannot abort transaction 7413, it was already committed ERROR: could not open file "base/27666/52860_fsm": Permission denied PANIC: cannot abort transaction 7413, it was already committed The connection to the server was lost. Attempting reset: Failed. I thought that removing the unreadable file would let me restart, but after "rm 52860_fsm" and trying again to start the server, there's a different problem: LOG: database system was interrupted while in recovery at 2011-12-08 00:56:11 EST HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. LOG: database system was not properly shut down; automatic recovery in progress LOG: consistent recovery state reached at 0/5D71BA8 LOG: redo starts at 0/5100050 WARNING: page 18 of relation base/27666/52860 is uninitialized CONTEXT: xlog redo visible: rel 1663/27666/52860; blk 18 PANIC: WAL contains references to invalid pages CONTEXT: xlog redo visible: rel 1663/27666/52860; blk 18 LOG: startup process (PID 14507) was terminated by signal 6 LOG: aborting startup due to startup process failure Note that this isn't even the same symptom Shraibman hit, since apparently he was failing on replaying the commit record. How is it that the main table file managed to have uninitialized pages? I suspect that "redo visible" WAL processing is breaking one or more of the fundamental WAL rules, perhaps by not generating a full-page image when it needs to. So this is really a whole lot worse than our behavior was in pre-FSM days, and it needs to get fixed. regards, tom lane
В списке pgsql-hackers по дате отправления: