Re: Enough RAM for entire Database.. cost aside, is thi
От | Marco Colombo |
---|---|
Тема | Re: Enough RAM for entire Database.. cost aside, is thi |
Дата | |
Msg-id | Pine.LNX.4.44.0407081244320.3962-100000@Megathlon.ESI обсуждение исходный текст |
Ответ на | Re: Enough RAM for entire Database.. cost aside, is this (Shridhar Daithankar <shridhar@frodo.hserus.net>) |
Список | pgsql-general |
On Thu, 8 Jul 2004, Shridhar Daithankar wrote: > Hi, > > Andy B wrote: > > 1. Postgresql is a two tiered cache mechanism. The first tier - the > > postgresql shared buffer cache sits on the second, larger tier, the linux > > buffer cache. So bits of the same data end up being in memory...twice, and > > two cache mechanisms operate at the same time. (That's how I understand it). > > That is correct. But I would advise you to see shared buffers as workspace > rather than cache. Hmm, I'm not sure that's true. The first time you read the data, maybe it gets copied twice (but I don't know the details of the implementation of buffers in PostgreSQL, I'm making wild guesses here). Later, things are not so simple. Since we're considering nested caches here, I think that whatever is "hot" in the PostgreSQL buffers, will automatically be "cold" in the Linux page cache, and will be a good canditate for eviction. You don't access _both_ copies for sure. If you find the data in a buffer, Linux won't notice you accessed it, and slowly mark its copy as "not recently used". So, on the long run, I think that "hot" data stays (only) in some application buffer, "warm" data in the Linux cache, "cold" data on disk. Multiple copies occur rarely, and for a relatively short time. Of course, I'm assuming there's some kind of memory pressure. If not, unless copies of data may stay in RAM "forever". > > 2. Even if the linux buffer cache contains all the data required for an > > execution of a plan, there is still a load of memory copying to do between > > these two tiers. Though memory copying is faster than disk access, it is > > still an overhead, and isn't there the real problem of thrashing between > > these two tiers if the plan can't fit all the data into the top tier, even > > if the thrashing is restricted to the memory system? > > That is certainly not correct. I don't have hard sources to back it up, but if > you open a file that you jus close it, linux does not go copying it from it's > cache to the process address space. It would rather juggle the page table to mak > e memory pages available to your process. I'm not familiar with recent kernel development. For sure, the kernel used copy_from/to_user() a lot in the past. You seem to overestimate the cost of RAM-to-RAM copy vs. the cost of messing with VM mappings. The open()/close() sequence won't copy anything, agreed. It's read() we're considering here. > By that argument, there would be three caches. Postgresql shared buffers, files > mapped into process address space and linux buffers. I think that defeats the > purpose of caching. [...] .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
В списке pgsql-general по дате отправления: