Обсуждение: Bug #671: server corrupt
Dmitry Riachtchentsev (diamondrain@mail.ru) reports a bug with a severity of 1 The lower the number the more severe it is. Short Description server corrupt Long Description The follow short sequence of SQL requests leads to server corrupt: 1. Run psql and type: begin; CREATE SEQUENCE B; commit; \q 2. After exiting run psql again and type: begin; SELECT nextval('B'); DROP SEQUENCE B; CREATE SEQUENCE B1; commit; > Here I get: "NOTICE: LockRelease: no such lock" > Then type: begin; CREATE SEQUENCE B2; > Here I get messages: --------------------------------------------------------------- server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. Failed. --------------------------------------------------------------- Then until restarting the server, connection to the server can not be established. The database log is: --------------------------------------------------------------- DEBUG: database system was shut down at 2002-05-21 11:40:09 EDT DEBUG: checkpoint record is at 0/276B98 DEBUG: redo record is at 0/276B98; undo record is at 0/0; shutdown TRUE DEBUG: next transaction id: 1421; next oid: 24991 DEBUG: database system is ready DEBUG: pq_recvbuf: unexpected EOF on client connection NOTICE: LockRelease: no such lock DEBUG: server process (pid 10918) was terminated by signal 10 DEBUG: terminating any other active server processes NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. DEBUG: all server processes terminated; reinitializing shared memory and semaphores DEBUG: database system was interrupted at 2002-05-21 11:40:46 EDT DEBUG: checkpoint record is at 0/276B98 DEBUG: redo record is at 0/276B98; undo record is at 0/0; shutdown TRUE DEBUG: next transaction id: 1421; next oid: 24991 DEBUG: database system was not properly shut down; automatic recovery in progress DEBUG: redo starts at 0/276BD8 DEBUG: ReadRecord: record with zero length at 0/28F640 DEBUG: redo done at 0/28F618 DEBUG: database system is ready --------------------------------------------------------------- PostgreSQL version is 7.2.1 Operation system is SunOS 5.8 Is my sequence of requests legal? Is there a patch that fixs this problem? If there is no patch, what is the root of the problem? Is there a kit of rules to avoid this situation? I am ready to provide you any more required information to reproduce and trace this situation. Please e-mail me. Thanks Dmitry Sample Code No file was uploaded with this report
pgsql-bugs@postgresql.org writes: > The follow short sequence of SQL requests leads to server corrupt: Looks like a bug to me too. I can still replicate the "NOTICE: LockRelease: no such lock" in current sources, but it seems that someone's already fixed the core dump; CVS tip does not crash. Will look into it more... regards, tom lane
pgsql-bugs@postgresql.org writes: > Is there a patch that fixs this problem? > If there is no patch, what is the root of the problem? Is there a kit of rules to avoid this situation? After more detailed investigation, I find both the notice and the subsequent crash have the same cause: after nextval(), sequence.c is holding a reference to an open relation cache entry for the sequence, which it intends to close at end of transaction. When you drop the sequence later in the same transaction, the cache entry goes away, leaving sequence.c with a dangling pointer. Its attempt to close the cache entry at commit not only causes the NOTICE complaint from the lock manager, but also modifies memory that may by now have been assigned to something else. The crash in your example was due to another cache entry having been corrupted in this way. A proper fix will require revising the way sequence.c holds its state; that strikes me as too large a change to risk back-patching into 7.2.* --- especially for a bug that has gone unnoticed for many years. AFAIK sequence.c has always been implemented like this, and thus has always had this problem. I will fix it for 7.3, but in the meantime I suggest not dropping a sequence that you've nextval'd, currval'd, or setval'd earlier in the same transaction. As a stopgap defense I will probably back-patch something to make it error out if you try. regards, tom lane