Re: Enough RAM for entire Database.. cost aside, is thi

Поиск

Список

Период

Сортировка

От	Marco Colombo
Тема	Re: Enough RAM for entire Database.. cost aside, is thi
Дата	9 июля 2004 г. 18:49:35
Msg-id	Pine.LNX.4.44.0407081244320.3962-100000@Megathlon.ESI обсуждение исходный текст
Ответ на	Re: Enough RAM for entire Database.. cost aside, is this (Shridhar Daithankar <shridhar@frodo.hserus.net>)
Список	pgsql-general

Дерево обсуждения

On Thu, 8 Jul 2004, Shridhar Daithankar wrote:

> Hi,
>
> Andy B wrote:
> > 1. Postgresql is a two tiered cache mechanism. The first tier - the
> > postgresql shared buffer cache sits on the second, larger tier, the linux
> > buffer cache. So bits of the same data end up being in memory...twice, and
> > two cache mechanisms operate at the same time. (That's how I understand it).
>
> That is correct. But I would advise you to see shared buffers as workspace
> rather than cache.

Hmm, I'm not sure that's true. The first time you read the data,
maybe it gets copied twice (but I don't know the details of the
implementation of buffers in PostgreSQL, I'm making wild guesses here).

Later, things are not so simple. Since we're considering nested caches
here, I think that whatever is "hot" in the PostgreSQL buffers, will
automatically be "cold" in the Linux page cache, and will be a good
canditate for eviction. You don't access _both_ copies for sure.
If you find the data in a buffer, Linux won't notice you accessed it,
and slowly mark its copy as "not recently used".

So, on the long run, I think that "hot" data stays (only) in some
application buffer, "warm" data in the Linux cache, "cold" data on disk.
Multiple copies occur rarely, and for a relatively short time. Of course,
I'm assuming there's some kind of memory pressure. If not, unless
copies of data may stay in RAM "forever".

> > 2. Even if the linux buffer cache contains all the data required for an
> > execution of a plan, there is still a load of memory copying to do between
> > these two tiers. Though memory copying is faster than disk access, it is
> > still an overhead, and isn't there the real problem of thrashing between
> > these two tiers if the plan can't fit all the data into the top tier, even
> > if the thrashing is restricted to the memory system?
>
> That is certainly not correct. I don't have hard sources to back it up, but if
> you open a file that you jus close it, linux does not go copying it from it's
> cache to the process address space. It would rather juggle the page table to mak
> e memory pages available to your process.

I'm not familiar with recent kernel development. For sure, the kernel
used copy_from/to_user() a lot in the past. You seem to overestimate
the cost of RAM-to-RAM copy vs. the cost of messing with VM mappings.

The open()/close() sequence won't copy anything, agreed. It's read()
we're considering here.

> By that argument, there would be three caches. Postgresql shared buffers, files
> mapped into process address space and linux buffers. I think that defeats the
> purpose of caching.
[...]

.TM.
--
      ____/  ____/   /
     /      /       /            Marco Colombo
    ___/  ___  /   /              Technical Manager
   /          /   /             ESI s.r.l.
 _____/ _____/  _/               Colombo@ESI.it

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Enough RAM for entire Database.. cost aside, is thi