Re: [HACKERS] TODO item
От | Tom Lane |
---|---|
Тема | Re: [HACKERS] TODO item |
Дата | |
Msg-id | 24481.949852063@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] TODO item (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Ответы |
Re: [HACKERS] TODO item
Re: [HACKERS] TODO item |
Список | pgsql-hackers |
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >>>> BTW, I have worked a little bit on this item. The idea is pretty >>>> simple. Instead of doing a real fsync() in pg_fsync(), just marking it >>>> so that we remember to do fsync() at the commit time. Following >>>> patches illustrate the idea. In the form you have shown it, it would be completely useless, for two reasons: 1. It doesn't guarantee that the right files are fsync'd. It would in fact fsync whichever files happen to be using the same kernel file descriptor numbers at the close of the transaction as the ones you really wanted to fsync were using at the time fsync was requested. 2. It doesn't guarantee that the files are fsync'd in the right order. Per my discussion a few days ago, the only reason for doing fsync at all is to guarantee that the data pages touched by a transaction get flushed to disk before the pg_log update claiming that the transaction is done gets flushed to disk. A change like this completely destroys that ordering, since pg_fsync_pending has no idea which fd is pg_log. You could possibly fix #1 by logging fsync requests at the vfd level; then, whenever a vfd is closed to free up a kernel fd, check the fsync flag and execute the pending fsync before closing the file. You could possibly fix #2 by having transaction commit invoke the pg_fsync_pending scan before it updates pg_log (and then fsyncing pg_log itself again after). (Actually, you could probably eliminate the notion of "fsync request" entirely, and simply have each vfd get marked "dirty" automatically when written to. Both closing a vfd and the scan at xact commit would look at the dirty bit to decide to do fsync.) What would still need to be thought about is whether this scheme preserves the ordering guarantee when a group of concurrent backends is considered, rather than one backend in isolation. (I believe that fsync() will apply to all dirty kernel buffers for a file, not just those dirtied by the requesting process, so each backend's fsyncs can affect the order in which other backends' writes hit the disk.) Offhand I do not see any problems there, but it's the kind of thing that requires more than offhand thought... regards, tom lane
В списке pgsql-hackers по дате отправления: