Our application is running Postgres 7.4, (working on conversion to 8.3
right now). Our testing involves various forms of violence, including
shutting off power and kill -9 postmaster.
Occasionally we observe a form of database corruption in which one of
the files storing a table or index disappears. The logs will contain
ERRORs that look like this:
could not open relation "some_table_name": No such file or directory
When this happens, and I cross-reference the pg_class.oid with the
expected file under PGDATA, the file is missing (and does not appear
to be in lost+found).
I have fsync set to true, and wal_sync_method set to fsync.
A few questions about this:
1) Why is this happening?
2) To help investigate this problem, I've written a script to
cross-reference pg_class and the files in PGDATA/base. (I know that I
should use pg_class.relfilenode instead of pg_class.oid -- I'll fix
that.) The question is how to check for consistency in the case of
large tables, which are split into multiple segments, (e.g. 123456.1,
123456.2). I.e., how can I find out how many segments there should be?
Any chance it's as simple as (pg_class.relpages + SUITABLE_CONSTANT -
1) / SUITABLE_CONSTANT?
Jack