Re: Use of O_DIRECT only for open_* sync options
От | Bruce Momjian |
---|---|
Тема | Re: Use of O_DIRECT only for open_* sync options |
Дата | |
Msg-id | 201103111147.p2BBlLN29891@momjian.us обсуждение исходный текст |
Ответ на | Re: Use of O_DIRECT only for open_* sync options (Greg Smith <greg@2ndquadrant.com>) |
Список | pgsql-hackers |
Greg Smith wrote: > Bruce Momjian wrote: > > xlogdefs.h says: > > > > /* > > * Because O_DIRECT bypasses the kernel buffers, and because we never > > * read those buffers except during crash recovery, it is a win to use > > * it in all cases where we sync on each write(). We could allow O_DIRECT > > * with fsync(), but because skipping the kernel buffer forces writes out > > * quickly, it seems best just to use it for O_SYNC. It is hard to imagine > > * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT. > > * Also, O_DIRECT is never enough to force data to the drives, it merely > > * tries to bypass the kernel cache, so we still need O_SYNC or fsync(). > > */ > > > > This seems wrong because fsync() can win if there are two writes before > > the sync call. Can kernels not issue fsync() if the write was O_DIRECT? > > If that is the cause, we should document it. > > > > The comment does look busted, because you did imagine exactly a case > where they might be combined. The only incompatibility that I'm aware > of is that O_DIRECT requires reads and writes to be aligned properly, so > you can't use it in random application code unless it's aware of that. > O_DIRECT and fsync are compatible; for example, MySQL allows combining > the two: http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html C comment updated in git head: * Because O_DIRECT bypasses the kernel buffers, and because we never* read those buffers except during crash recovery orif wal_level != minimal,* it is a win to use it in all cases where we sync on each write(). We could* allow O_DIRECTwith fsync(), but it is unclear if fsync() could process* writes not buffered in the kernel. Also, O_DIRECT isnever enough to force* data to the drives, it merely tries to bypass the kernel cache, so we still* need O_SYNC/O_DSYNC. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
В списке pgsql-hackers по дате отправления: