Обсуждение: Possibly corrupted pg_control file after machine crash

Поиск
Список
Период
Сортировка

Possibly corrupted pg_control file after machine crash

От
Michael Wood
Дата:
Hi

A VM of mine running PostgreSQL 8.3 (from Ubuntu 8.10) died a couple
of nights ago.  It was not doing anything at the time, and the
database was on an external (NAS) volume using ext3.  When I set up a
new VM with Ubuntu 8.10 and attached the external volume to it, fsck
replayed the journal but was otherwise happy.  I ran fsck with the -f
option after that just to make sure and it reported no problems.

Now if I run pg_controldata it complains about the CRC not being
correct on pg_control.

The database had not been modified in quite a while.  There is only
one file in pg_clog with a timestamp from the middle of last month.
The newest file in pg_xlog has the same timestamp (actually 6 seconds
newer).  The timestamp on pg_control is the same as the latest pg_xlog
file.  The only file that's newer is the pgstat.stat file, which is
timestamped about the time of the crash (I got an alert at 01:24:39.)

-rw------- 1 postgres postgres 16384 Jun 17 12:59 pg_clog/0000
-rw------- 1 postgres postgres 16777216 Jun 17 12:59
pg_xlog/00000001000000000000001B
-rw------- 1 postgres postgres 8192 Jun 17 12:59 global/pg_control
-rw------- 1 postgres postgres 42664 Jul  2 01:19 global/pgstat.stat

I also have a snapshot of the external volume from the 25th of June,
but the pg_control file is identical to the one on the volume.

I don't understand why the pg_control file might be corrupted (or what
else could be the problem).  Is there some way I can recover from
this?

Here's the output from pg_controldata:
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:            833
Catalog version number:               200711281
Database system identifier:           5325665671964054206
Database cluster state:               in production
pg_control last modified:             Thu 01 Jan 1970 02:00:00 AM SAST
Latest checkpoint location:           4A38CC9A/0
Prior checkpoint location:            0/1B0FC7E8
Latest checkpoint's REDO location:    0/1B0EE650
Latest checkpoint's TimeLineID:       0
Latest checkpoint's NextXID:          454019048/1
Latest checkpoint's NextOID:          0
Latest checkpoint's NextMultiXactId:  50322
Latest checkpoint's NextMultiOffset:  24576
Time of latest checkpoint:            Thu 01 Jan 1970 02:00:01 AM SAST
Minimum recovery ending location:     0/4A38CC94
Maximum data alignment:               0
Database block size:                  8
Blocks per segment of large relation: 0
WAL block size:                       0
Bytes per WAL segment:                1093850759
Maximum length of identifiers:        8192
Maximum columns in an index:          131072
Maximum size of a TOAST chunk:        8192
Date/time type storage:               64-bit integers
Maximum length of locale name:        64
LC_COLLATE:
LC_CTYPE:

Note that the LC_COLLATE and LC_CTYPE fields appear to be blank.
Also, the timestamps and some other fields seem to be zeroed.  (My
time zone is UTC+2).

If I look at the file, though, I see the locale information is in there:
# strings /var/lib/postgresql/8.3/main/global/pg_control
en_US.UTF-8
en_US.UTF-8

Any hints? :)

Thanks.

--
Michael <esiotrot@gmail.com>

Re: Possibly corrupted pg_control file after machine crash

От
Greg Stark
Дата:
At least the following fields seem to be garbage:

On Fri, Jul 3, 2009 at 12:32 PM, Michael Wood<esiotrot@gmail.com> wrote:
> Maximum data alignment:               0
> Database block size:                  8
> Blocks per segment of large relation: 0
> WAL block size:                       0
> Bytes per WAL segment:                1093850759
> Maximum length of identifiers:        8192
> Maximum columns in an index:          131072
> Maximum size of a TOAST chunk:        8192
> Date/time type storage:               64-bit integers
> Maximum length of locale name:        64
> LC_COLLATE:
> LC_CTYPE:


One option might be to initdb a new database, grab the control file
from there, use pg_resetxlog to restore the important values from your
old control file and then see if you can pick up from there.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: Possibly corrupted pg_control file after machine crash

От
Tom Lane
Дата:
Michael Wood <esiotrot@gmail.com> writes:
> A VM of mine running PostgreSQL 8.3 (from Ubuntu 8.10) died a couple
> of nights ago.  It was not doing anything at the time, and the
> database was on an external (NAS) volume using ext3.  When I set up a
> new VM with Ubuntu 8.10 and attached the external volume to it, fsck
> replayed the journal but was otherwise happy.  I ran fsck with the -f
> option after that just to make sure and it reported no problems.

> Now if I run pg_controldata it complains about the CRC not being
> correct on pg_control.

Are you using the exact same PG executables that you used before?
These symptoms look quite a lot like a previous report that turned
out to be due to rebuilding PG with a different size for time_t.

            regards, tom lane