Re: O_DIRECT for WAL writes
От | Ron Mayer |
---|---|
Тема | Re: O_DIRECT for WAL writes |
Дата | |
Msg-id | 429AC920.6080809@cheapcomplexdevices.com обсуждение исходный текст |
Ответ на | Re: O_DIRECT for WAL writes (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-patches |
Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: >>is opening a file with O_DIRECT sufficient to ensure that >>a write(2) does not return until the data has hit disk? > > Some googling suggests so, eg > http://www.die.net/doc/linux/man/man2/open.2.html Really? On that page I read: "O_DIRECT...at the completion of the read(2) or write(2) system call, data is guaranteed to have been transferred." which sounds to me like transfered to the device's cache but not necessarily flushed through the device's cache. It says nothing about physical media. That wording feels different to me from O_SYNC which reads: "O_SYNC will block the calling process until the data has been physically written to the underlying hardware." which does suggest to me that it writes to physical media. Or am I reading that wrong? PS: I've gotten way out of my depth here, but... ...attempting to browse the Linux source(!!) Looking at the O_SYNC stuff in ext3: http://lxr.linux.no/source/fs/ext3/file.c#L67 it looks like in this conditional: if (file->f_flags & O_SYNC) { ... goto force_commit; } the goto branch calls ext3_force_commit() in much the same way that it seems fsync() does here: http://lxr.linux.no/source/fs/ext3/fsync.c#L71 so I believe O_SYNC does at least as much as fsync(). However I can't find O_DIRECT anywhere in the ext3 stuff, so if it does work it's less obvious how or if it could. Moreover I see O_SYNC used lots of places: http://lxr.linux.no/ident?i=O_SYNC in various places like fs/ext3/; and and I don't see O_DIRECT in nearly as many places http://lxr.linux.no/ident?i=O_DIRECT It looks like reiserfs and xfs seem look at O_DIRECT, but ext3 doesn't appear to unless it's somewhere outside the fs/ext3 directory. PPS: Of course not even fsync() flushed correctly until very recent kernels: http://hardware.slashdot.org/comments.pl?sid=149349&cid=12519114 In that article Jeff Garzik (the linux SATA driver guy) suggests that until very recent kernels ext3 did not have write barrier support that issues the FLUSH CACHE (IDE) or SYNCHRONIZE CACHE (SCSI) commands even on fsync. PPPS: No, I don't understand the kernel - I'm just showing what quick grep commands showed without any deep understanding.
В списке pgsql-patches по дате отправления: