Re: silent data loss with ext4 / all current versions
От | Teodor Sigaev |
---|---|
Тема | Re: silent data loss with ext4 / all current versions |
Дата | |
Msg-id | 56584DFA.6090101@sigaev.ru обсуждение исходный текст |
Ответ на | silent data loss with ext4 / all current versions (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Список | pgsql-hackers |
> What happens is that when we recycle WAL segments, we rename them and then sync > them using fdatasync (which is the default on Linux). However fdatasync does not > force fsync on the parent directory, so in case of power failure the rename may > get lost. The recovery won't realize those segments actually contain changes Agree. Some time ago I faced with this, although it wasn't a postgres. > So, what's going on? The problem is that while the rename() is atomic, it's not > guaranteed to be durable without an explicit fsync on the parent directory. And > by default we only do fdatasync on the recycled segments, which may not force > fsync on the directory (and ext4 does not do that, apparently). > > This impacts all current kernels (tested on 2.6.32.68, 4.0.5 and 4.4-rc1), and > also all supported PostgreSQL versions (tested on 9.1.19, but I believe all > versions since spread checkpoints were introduced are vulnerable). > > FWIW this has nothing to do with storage reliability - you may have good drives, > RAID controller with BBU, reliable SSDs or whatever, and you're still not safe. > This issue is at the filesystem level, not storage. Agree again. > I plan to do more power failure testing soon, with more complex test scenarios. > I suspect there might be other similar issues (e.g. when we rename a file before > a checkpoint and don't fsync the directory - then the rename won't be replayed > and will be lost). It would be very useful, but I hope you will not find a new bug :) -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: