Re: [HACKERS] TODO item
От | Tom Lane |
---|---|
Тема | Re: [HACKERS] TODO item |
Дата | |
Msg-id | 20018.949941617@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] TODO item (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Список | pgsql-hackers |
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> possibly fix #2 by having transaction commit invoke the pg_fsync_pending >> scan before it updates pg_log (and then fsyncing pg_log itself again >> after). > I do not understand #2. I call pg_fsync_pending twice in > RecordTransactionCommit, one is after FlushBufferPool, and the other > is after TansactionIdCommit and FlushBufferPool. Or am I missing > something? Oh, OK. That's what I meant. The snippet you posted didn't show where you were calling the fsync routine from. > I thought about that too. If the ordering was that important, a > database managed by backends with -F on could be seriously > corrupted. I've never heard of such disasters caused by -F. This is why I think that fsync actually offers very little extra protection ;-) > BTW, Hiroshi has noticed me an excellent point #3: >> This backend has to force the flush of a free buffer >> page. Unfortunately the page was dirtied by the >> above operation of Session-1 and calls pg_fsync() >> for the table A. However fsync() is postponed until >> commit of this backend. >> >> Session-1 >> commit; >> There's no dirty buffer page for the table A. >> So pg_fsync() isn't called for the table A. Oooh, right. Backend A dirties the page, but leaves it sitting in shared buffer. Backend B needs the buffer space, so it does the fwrite of the page. Now if backend A wants to commit, it can fsync everything it's written --- but does that guarantee the page that was actually written by B will get flushed to disk? Not sure. If the pending-fsync logic is based on either physical fds or vfds then it definitely *won't* work; A might have found the desired page sitting in buffer cache to begin with, and never have opened the underlying file at all! So it seems you would need to keep a list of all the relation files (and segments) you've written to in the current xact, and open and fsync each one just before writing/fsyncing pg_log. Even then, you're assuming that fsync applied to a file via an fd belonging to one backend will flush disk buffers written to the same file via *other* fds belonging to *other* processes. I'm not sure that that is true on all Unixes... heck, I'm not sure it's true on any. The fsync(2) man page here isn't real specific. regards, tom lane
В списке pgsql-hackers по дате отправления: