Re: O_DIRECT in freebsd
От | Sean Chittenden |
---|---|
Тема | Re: O_DIRECT in freebsd |
Дата | |
Msg-id | 20030623003129.GH97131@perrin.int.nxad.com обсуждение исходный текст |
Ответ на | Re: O_DIRECT in freebsd (Bruce Momjian <pgman@candle.pha.pa.us>) |
Ответы |
Re: O_DIRECT in freebsd
|
Список | pgsql-hackers |
> Basically, we don't know when we read a buffer whether this is a > read-only or read/write. In fact, we could read it in, and another > backend could write it for us. Um, wait. The cache is shared between backends? I don't think so, but it shouldn't matter because there has to be a semaphore locking the cache to prevent the coherency issue you describe. If PostgreSQL didn't, it'd be having problems with this now. I'd also think that MVCC would handle the case of updated data in the cache as that has to be a common case. At what point is the cached result invalidated and fetched from the OS? > The big issue is that when we do a write, we don't wait for it to > get to disk. Only in the case when fsync() is turned off, but again, that's up to the OS to manage that can of worms, which I think BSD takes care of that. From conf/NOTES: # Attempt to bypass the buffer cache and put data directly into the # userland buffer for read operation when O_DIRECT flag is set on the # file. Both offset and length of the read operation must be # multiples of the physical media sector size. # #options DIRECTIO The offsets and length bit kinda bothers me though, but I thin that's stuff that the ernel has to take into account, not the userland calls, I wonder if that's actually accurate any more or affects userland calls... seems like that'd be a bit too much work to have the user do, esp given the lack of documentation on the flag... should be just drop in additional flag, afaict. > It seems to use O_DIRECT, we would have to read the buffer in a > special way to mark it as read-only, which seems kind of strange. I > see no reason we can't use free-behind in the PostgreSQL buffer > cache to handle most of the benefits of O_DIRECT, without the > read-only buffer restriction. I don't see how this'd be an issue as buffers populated via a read(), that are updated, and then written out, would occupy a new chunk of disk to satisfy MVCC. Why would we need to mark a buffer as read only and carry around/check its state? -sc -- Sean Chittenden
В списке pgsql-hackers по дате отправления: