Re: [HACKERS] TODO item
От | Tom Lane |
---|---|
Тема | Re: [HACKERS] TODO item |
Дата | |
Msg-id | 21589.949965409@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] TODO item (Bruce Momjian <pgman@candle.pha.pa.us>) |
Ответы |
Re: [HACKERS] TODO item
|
Список | pgsql-hackers |
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I can't imagine how fsync could flush _only_ the file discriptor buffers > modified by the current process. It would have to affect all buffers > for the file descriptor. Yeah, you're probably right. After thinking about it, I can't believe that a disk block buffer inside the kernel has any record of which FD it was written by (after all, it could have been dirtied through more than one FD since it was last synced to disk). All it's got is a file inode number and a block number within the file. Presumably fsync() searches the buffer cache for blocks that match the FD's inode number and schedules I/O for all the ones that are dirty. > So, I think we are safe if we can either keep that file descriptor open > until commit, or re-open it and fsync it on commit. That assume a > re-open is hitting the same file. My opinion is that we should just > fsync it on close and not worry about a reopen. There's still the problem that your backend might never have opened the relation file at all, still less done a write through its fd or vfd. I think we would need to have a separate data structure saying "these relations were dirtied in the current xact" that is not tied to fd's or vfd's. Maybe the relcache would be a good place to keep such a flag. Transaction commit would look like: * scan buffer cache for dirty buffers, fwrite each one that belongs to one of the relations I'm trying to commit; * open and fsync each segment of each rel that I'm trying to commit (or maybe just the dirtied segments, if we want to do the bookkeeping at that level of detail); * make pg_log entry; * write and fsync pg_log. fsync-on-close is probably a waste of cycles. The only way that would matter is if someone else were doing a RENAME TABLE on the rel, thus preventing you from reopening it. I think we could just put the responsibility on the renamer to fsync the file while he's doing it (in fact I think that's already in there, at least to the extent of flushing the buffer cache). regards, tom lane
В списке pgsql-hackers по дате отправления: