Re: production server down
| От | Joe Conway |
|---|---|
| Тема | Re: production server down |
| Дата | |
| Msg-id | 41D04FA4.7010402@joeconway.com обсуждение исходный текст |
| Ответ на | Re: production server down (Tom Lane <tgl@sss.pgh.pa.us>) |
| Список | pgsql-hackers |
Tom Lane wrote: > Are you using one of the scripts that > does an auto initdb if it doesn't see a valid PGDATA? 11 seconds might > be about right for that. > > One problem with this theory is how come you didn't get screwed during > *that* boot cycle. It seems to require assuming that the NFS mount came > online just after the initdb finished (else initdb would have > overwritten the on-NFS pg_control) but before the regular postmaster > started (else this same scenario would have played out then). That's > not a very wide window. [followup] We've now had a chance to bring Postgres down and check under the mount point. There *is* indeed a newly initdb'd cluster under there. FWIW the control file is corrupt: # pg_controldata /home/jconway/pgsql/fds/replica/pgdata WARNING: Calculated CRC checksum does not match value stored in file. Either the file is corrupt, or it has a different layout than this program is expecting. The results below are untrustworthy. pg_control version number: 72 Catalog version number: 200310211 Database cluster state: in production pg_control last modified: Sat Feb 6 22:28:16 2106 Current log file ID: 0 Next log file segment: 10161036 Latest checkpoint location: 0/9AA1B4 Prior checkpoint location: 0/9B0B8C Latest checkpoint's REDO location: 0/0 Latest checkpoint's UNDO location: C/218 Latest checkpoint's StartUpID: 17142 Latest checkpoint's NextXID: 1099443932 Latest checkpoint's NextOID: 8192 Time of latest checkpoint: Wed Apr 8 07:05:36 6325 Database block size: 1 Blocks per segment of large relation: 128 Maximum length of identifiers: 67 Maximum number of function arguments: 0 Date/time type storage: floating-point numbers Maximum length of locale name: 0 LC_COLLATE: LC_CTYPE: I have a tarred copy of the under-the-mount PGDATA if anyone is interested in examining it. BTW, there was another Postgres cluster on this same server which we had not used since the November 2 reboot -- it was corrupt in pretty much the same way and also had an initdb'd cluster under its mount. So it looks like using an auto initdb startup script is a very bad idea when using an NFS mounted PGDATA. We left the under-mount structure in place and did "chown root:root" and "chmod 000" on it. And, as mentioned in an earlier post, we now rely on the dba to start postgres manually after a server restart. Joe
В списке pgsql-hackers по дате отправления: