Re: 9.3.9 and pg_multixact corruption
От | Christoph Berg |
---|---|
Тема | Re: 9.3.9 and pg_multixact corruption |
Дата | |
Msg-id | 20150911122538.GA2672@msg.df7cb.de обсуждение исходный текст |
Ответ на | 9.3.9 and pg_multixact corruption (Bernd Helmle <bernd@oopsware.de>) |
Ответы |
Re: 9.3.9 and pg_multixact corruption
|
Список | pgsql-hackers |
Re: Bernd Helmle 2015-09-10 <7E3C7F8D210AC9A423E96F3A@eje.local> > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file > "pg_multixact/members/FFFF5FC4" to offset 4294950912: Invalid argument. > 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 > offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) > 2015-09-08 11:40:59 CEST [27045] LOG: startup process (PID 27047) exited > with exit code 1 > 2015-09-08 11:40:59 CEST [27045] LOG: aborting startup due to startup > process failure > > Some side notes: > > An additional recovery from a base backup and archive recovery yield to the > same error, as soon as the affected tuple was touched with a DELETE. The > affected table was fully dumpable via pg_dump, though. A few more words here: the archive recovery was a pitr to 00:45, so well before the problem, and the cluster was initially working well, but crashed shortly after with the same mxid 1068235595 message. The crash was triggered from a delete on a different table (which was related schema-wise, but iirc neither of these tables has any FKs). We then rewound the system to a zfs snapshot taken when the archive recovery had finished (db shut down cleanly), and put it up again, when it again crashed with mxid 1068235595, this time on a third table. The original crash and the first post-recovery crash happened a few minutes after pg_start_backup(), though the next crash was without that. (While the archive recovery was running, I had pg_resetxlog the original cluster. It was possible to isolate the ctid of an affected tuple, but it wasn't possible to DELETE it, yielding an error message similar to the above, but the database would continue. I then zeroed the bad block using dd (zero_damaged_pages didn't help), only to find that at least one more tuple in that table was affected (with a different mxid).) Christoph
В списке pgsql-hackers по дате отправления: