Re: silent data loss with ext4 / all current versions
От | Tomas Vondra |
---|---|
Тема | Re: silent data loss with ext4 / all current versions |
Дата | |
Msg-id | 565B0CBB.4090406@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: silent data loss with ext4 / all current versions (Craig Ringer <craig@2ndquadrant.com>) |
Ответы |
Re: silent data loss with ext4 / all current versions
|
Список | pgsql-hackers |
Hi, On 11/29/2015 02:38 PM, Craig Ringer wrote: > On 27 November 2015 at 21:28, Greg Stark <stark@mit.edu > <mailto:stark@mit.edu>> wrote: > > On Fri, Nov 27, 2015 at 11:17 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> > wrote: > > I plan to do more power failure testing soon, with more complex test > > scenarios. I suspect there might be other similar issues (e.g. when we > > rename a file before a checkpoint and don't fsync the directory - then the > > rename won't be replayed and will be lost). > > I'm curious how you're doing this testing. The easiest way I can think > of would be to run a database on an LVM volume and take a large number > of LVM snapshots very rapidly and then see if the database can start > up from each snapshot. Bonus points for keeping track of the committed > transactions before each snaphsot and ensuring they're still there I > guess. > > > I've had a few tries at implementing a qemu-based crashtester where it > hard kills the qemu instance at a random point then starts it back up. I've tried to reproduce the issue by killing a qemu VM, and so far I've been unsuccessful. On bare HW it was easily reproducible (I'd hit the issue 9 out of 10 attempts), so either I'm doing something wrong or qemu somehow interacts with the I/O. > I always got stuck on the validation part - actually ensuring that the > DB state is how we expect. I think I could probably get that right now, > it's been a while. Weel, I guess we can't really check all the details, but I guess the checksums make checking the general consistency somewhat simpler. And then you have to design the workload in a way that makes the check easier - for example remembering the committed values etc. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: