Re: Fast insertion indexes: why no developments

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Fast insertion indexes: why no developments
Дата
Msg-id CAMkU=1y10DrEihEi_i=0J=SiQ7fdq1K-7EP42rQXp2mquZw_xA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fast insertion indexes: why no developments  (Leonardo Francalanci <m_lists@yahoo.it>)
Ответы Re: Fast insertion indexes: why no developments  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Re: Fast insertion indexes: why no developments  (Leonardo Francalanci <m_lists@yahoo.it>)
Список pgsql-hackers
On Wed, Oct 30, 2013 at 9:54 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
Jeff Janes wrote
> The index insertions should be fast until the size of the active part of
> the indexes being inserted into exceeds shared_buffers by some amount
> (what
> that amount is would depend on how much dirty data the kernel is willing
> to
> allow in the page cache before it starts suffering anxiety about it).  If
> you have enough shared_buffers to make that last for 15 minutes, then you
> shouldn't have a problem inserting with live indexes.

Sooner or later you'll have to checkpoint those shared_buffers...

True, but that is also true of indexes created in bulk.  It all has to reach disk eventually--either the checkpointer writes it out and fsyncs it, or the background writer or user backends writes it out and the checkpoint fsyncs it.  If bulk creation uses a ring buffer strategy (I don't know if it does), then it might kick the buffers to kernel in more or less physical order, which would help the kernel get them to disk in long sequential writes.  Or not.  I think that this is where sorted checkpoint could really help.

> and we are
> talking about GB of data (my understanding is that we change basically every
> btree page, resulting in re-writing of the whole index).

If the checkpoint interval is as long as the partitioning period, then hopefully the active index buffers get re-dirtied while protected in shared_buffers, and only get written to disk once.  If the buffers get read, dirtied, and evicted from a small shared_buffers over and over again then you are almost guaranteed that will get written to disk multiple times while they are still hot, unless your kernel is very aggressive about caching dirty data (which will cause other problems).

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sergey Konoplev
Дата:
Сообщение: Re: [PATCH] Use MAP_HUGETLB where supported (v3)
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: Fast insertion indexes: why no developments