Обсуждение: Invalid Page Headers
I've got a database that reported this error yesterday morning after an UPDATE statement: ERROR: invalid page header in block 34 of relation "pg_toast_167245" Later in the day, it reported this one: ERROR: invalid page header in block 199 of relation "<table1>_pkey" this time after a SELECT. Before these errors had been brought to my attention, I was looking late that night at a long-running query on an unrelated table. There was a query that should've finished in seconds that was taking hours. I later heard from a developer that there were some deadlock-related issues with the query in question, and the development team wound up issuing: pg_ctl kill QUIT [process_id] to get rid of the problem query this morning, which was a relatively unimportant SELECT. Then, shortly thereafter, they began seeing yet another invalid page header: invalid page header in block 4369 of relation "<table2>" So there are currently three separate relations exhibiting invalid page errors. This box is a Debian 3.1 box running a custom Linux 2.6.10 #6 SMP kernel. Postgres 8.1.3 was compiled from source. pgpool 3.0.1, also built from source, is used by some parts of the application layer. The system is running on an ext3 filesystem, WAL is on a 4-disk RAID 10 running JFS, and data is on a 12-disk RAID 10 running JFS. I'm not seeing any signs of apparent kernel or hardware errors in the system and kernel logs. Also see nearby thread of a troubling error from the night before the first sight of invalid page headers: http://archives.postgresql.org/pgsql-general/2006-04/msg00746.php Is there any way in which this could be related to the invalid page headers? Based on this thread: http://archives.postgresql.org/pgsql-general/2005-11/msg01140.php I'm a little nervous about the prospects for analysis and recovery here. Any thoughts? Is there a risk that if we took postgres offline in this state that it would not come back up? -- Thomas F. O'Connell Database Architecture and Programming Sitening, LLC http://www.sitening.com/ 3004 B Poston Avenue Nashville, TN 37203-1314 615-260-0005 (cell) 615-469-5150 (office) 615-469-5151 (fax)
On Apr 18, 2006, at 12:30 PM, Thomas F. O'Connell wrote: > So there are currently three separate relations exhibiting invalid > page errors. > > This box is a Debian 3.1 box running a custom Linux 2.6.10 #6 SMP > kernel. Postgres 8.1.3 was compiled from source. pgpool 3.0.1, also > built from source, is used by some parts of the application layer. > The system is running on an ext3 filesystem, WAL is on a 4-disk > RAID 10 running JFS, and data is on a 12-disk RAID 10 running JFS. > I'm not seeing any signs of apparent kernel or hardware errors in > the system and kernel logs. I take back the lack of errors. megamgr is now reporting 5 (!) failed drives on a single channel in the RAID 10 for data. The RAID card is a MegaRAID SCSI 320-2X. I would've expected the RAID to protect postgres from the possibility of data corruption, but I guess not. In any event, we're working on replacing the failed drives. After the RAID is rebuilt, though, the focus will be on data. Is my best bet to restore the corrupted relations, or can I repair them somehow? And I'm still concerned about whether postgres will recover if I stop it at this point, so I'm working on contingency plans for leaving postgres online, turning off the application, restoring the tables while nothing is accessing postgres, and then restarting the application. Is there a safer/better course of action available? -- Thomas F. O'Connell Database Architecture and Programming Sitening, LLC http://www.sitening.com/ 3004 B Poston Avenue Nashville, TN 37203-1314 615-260-0005 (cell) 615-469-5150 (office) 615-469-5151 (fax)
"Thomas F. O'Connell" <tfo@sitening.com> writes: > I would've expected the RAID to protect postgres from the possibility > of data corruption, but I guess not. Ooops :-(. It might be interesting to get pg_filedump dumps of the corrupted pages, just to see what the failure pattern looks like. I doubt there's much we can do about it, but you don't know till you look. ("If we knew what it was we were doing, it wouldn't be research.") regards, tom lane
On Apr 18, 2006, at 2:15 PM, Tom Lane wrote: > "Thomas F. O'Connell" <tfo@sitening.com> writes: >> I would've expected the RAID to protect postgres from the possibility >> of data corruption, but I guess not. > > Ooops :-(. It might be interesting to get pg_filedump dumps of the > corrupted pages, just to see what the failure pattern looks like. > I doubt there's much we can do about it, but you don't know till you > look. ("If we knew what it was we were doing, it wouldn't be > research.") I'm still looking into what actually caused the hardware failures and how it could've reached a level where it would've affected postgres. I'll grab pg_filedump and report back. In the meantime, should I leave the database online while I attempt to recover? It's still unclear to me whether postgres will restart with invalid page headers. I certainly can't run pg_dumpall, for instance. I'm glad that there have so far been only 3 relations affected... :( -- Thomas F. O'Connell Database Architecture and Programming Sitening, LLC http://www.sitening.com/ 3004 B Poston Avenue Nashville, TN 37203-1314 615-260-0005 (cell) 615-469-5150 (office) 615-469-5151 (fax)
"Thomas F. O'Connell" <tfo@sitening.com> writes: > In the meantime, should I leave the database online while I attempt > to recover? It's still unclear to me whether postgres will restart > with invalid page headers. I certainly can't run pg_dumpall, for > instance. Has the database successfully checkpointed since the last error occurrence? If so, I wouldn't foresee any problem with a shutdown. If checkpoints are failing then you probably won't be able to shut down without getting the same error. (Use pg_controldata if you're not sure about when the last checkpoint happened.) > I'm glad that there have so far been only 3 relations affected... :( You mean, three that you know about :-( regards, tom lane
On Apr 18, 2006, at 2:57 PM, Tom Lane wrote: > "Thomas F. O'Connell" <tfo@sitening.com> writes: >> In the meantime, should I leave the database online while I attempt >> to recover? It's still unclear to me whether postgres will restart >> with invalid page headers. I certainly can't run pg_dumpall, for >> instance. > > Has the database successfully checkpointed since the last error > occurrence? If so, I wouldn't foresee any problem with a shutdown. > If checkpoints are failing then you probably won't be able to shut > down without getting the same error. (Use pg_controldata if you're > not sure about when the last checkpoint happened.) The last checkpoint was at Tue Apr 18 15:30:24 2006, so it looks like we're fine on that front. >> I'm glad that there have so far been only 3 relations affected... :( > > You mean, three that you know about :-( True enough. The RAID is rebuilding with new drives right now. Then we'll recover the relations known to be corrupt. (If a pg_toast relation had an invalid page header, would restoring the table to which it's connected solve the problem?) Then I'll do a pg_dumpall. If that works, that should be a pretty good indicator that the database is in a stable state, right? -- Thomas F. O'Connell Database Architecture and Programming Sitening, LLC http://www.sitening.com/ 3004 B Poston Avenue Nashville, TN 37203-1314 615-260-0005 (cell) 615-469-5150 (office) 615-469-5151 (fax)
On Apr 18, 2006, at 2:15 PM, Tom Lane wrote: "Thomas F. O'Connell" <tfo@sitening.com> writes: I would've expected the RAID to protect postgres from the possibility of data corruption, but I guess not. Ooops :-(. It might be interesting to get pg_filedump dumps of the corrupted pages, just to see what the failure pattern looks like. I doubt there's much we can do about it, but you don't know till you look. ("If we knew what it was we were doing, it wouldn't be research.") Any tips on turning "ERROR: invalid page header in block 34 of relation" into a pg_filedump command that would yield something useful or interesting? If so, I'll post the results of all three relations known to be corrupt so far. Also, are there any questions I could be asking vendors (Dell, LSI) that would help sort out how the RAID contributed to corruption on disk? -- Thomas F. O'Connell Database Architecture and Programming Sitening, LLC http://www.sitening.com/ 3004 B Poston Avenue Nashville, TN 37203-1314 615-260-0005 (cell) 615-469-5150 (office) 615-469-5151 (fax)
"Thomas F. O'Connell" <tfo@sitening.com> writes: > Any tips on turning "ERROR: invalid page header in block 34 of > relation" into a pg_filedump command that would yield something useful > or interesting? If so, I'll post the results of all three relations > known to be corrupt so far. For me, the nicest pg_filedump output format is "-i -f", but if the page is sufficiently hosed that pg_filedump can't make any sense of it, you may have to fall back to a raw dump, "-d". In any case, if you don't want to dump the entire file, use "-R blkno" or "-R startblk endblk" to constrain the dump. BTW, last I heard the posting size limit on these lists was only about 20K. If you don't want your results to languish awhile in the moderator's approval queue, you might want to post a link to the dump instead of the whole thing. regards, tom lane
On Apr 19, 2006, at 6:42 PM, Tom Lane wrote: > "Thomas F. O'Connell" <tfo@sitening.com> writes: >> Any tips on turning "ERROR: invalid page header in block 34 of >> relation" into a pg_filedump command that would yield something >> useful >> or interesting? If so, I'll post the results of all three relations >> known to be corrupt so far. > > For me, the nicest pg_filedump output format is "-i -f", but if the > page > is sufficiently hosed that pg_filedump can't make any sense of it, you > may have to fall back to a raw dump, "-d". In any case, if you don't > want to dump the entire file, use "-R blkno" or "-R startblk endblk" > to constrain the dump. > > BTW, last I heard the posting size limit on these lists was only about > 20K. If you don't want your results to languish awhile in the > moderator's approval queue, you might want to post a link to the dump > instead of the whole thing. Unfortunately, I couldn't get the pg_filedump request through to the list until yesterday evening (inexplicable mail delivery issue), by which point the known points of corruption had already been fixed. The upside is that the corruption seems to have been restricted to three relations, all of which have been fixed. I was able to perform a successful pg_dump. But I'm still disappointed in the hardware bleed-through affecting the filesystem. :( -- Thomas F. O'Connell Database Architecture and Programming Sitening, LLC http://www.sitening.com/ 3004 B Poston Avenue Nashville, TN 37203-1314 615-260-0005 (cell) 615-469-5150 (office) 615-469-5151 (fax)
On Thu, Apr 20, 2006 at 12:07:13PM -0500, Thomas F. O'Connell wrote: > But I'm still disappointed in the hardware bleed-through affecting > the filesystem. :( I (well, http://stats.distributed.net/) has been bit by this before. One issue with RAID is that it does you no good if you don't get notification that a drive has failed. Even after that, there's still risk of another failure before a sucessful rebuild, and of course something can always go wrong during recovery. In a nutshell, RAID's still no replacement for backups. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461