(Again) Datacorruption using 7.4.2 on XFS/raid1
От | Florian G. Pflug |
---|---|
Тема | (Again) Datacorruption using 7.4.2 on XFS/raid1 |
Дата | |
Msg-id | 20040712183115.GA3913@foobar.solution-x.com обсуждение исходный текст |
Ответы |
Re: (Again) Datacorruption using 7.4.2 on XFS/raid1
Re: (Again) Datacorruption using 7.4.2 on XFS/raid1 |
Список | pgsql-general |
Hi We have again experienced data-corruption using 7.4.2 on an XFS Filesystem on top of a software-raid (md) raid-1. After a server crash last night (It was a rather strange crash - The machine was still pingable, but no login was possible, and postgres and apache didn't respond to requests any more) we hard-reset the machine. It came up again nicely, but a few hours later the following errors occured when trying to access certain tabled. (Those tables are updated heavily - each day about 2 million tuples are inserted, and the old versions of those tuples deleted). ERROR: could not access status of transaction 34048 DETAIL: could not open file "/var/lib/postgres/data/pg_clog/0000": No such file or directory While reading linux-kernel today, I stumbled upon a description of a rather strange XFS behaviour. It seems to zero a block if the block was updated, and the corresponding metadata-update was flushed to disk, but not the data itself. It does not happen if the file is fsynced() after the update - but I was wondering what would happen if the machine crashed between the write() and the fsync(). The lkml thread about this can be found here: http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/0359.html Could this XFS behaviour cause the postgres problems we are seeing? greetings, Florian Pflug
В списке pgsql-general по дате отправления: