Re: Processing btree walks as a batch to parallelize IO

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Processing btree walks as a batch to parallelize IO
Дата
Msg-id CAM-w4HNJN7HvYkB+W_WB5BSQLEd=VjQe3rXd+1e08ZtqnD_a7g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Processing btree walks as a batch to parallelize IO  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Processing btree walks as a batch to parallelize IO  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Fri, 9 Apr 2021 at 16:58, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 4/9/21 7:33 PM, James Coleman wrote:

> > A specific area where this is particularly painful is btree index reads.
> > Walking the tree to leaf pages isn't naturally prefetchable, and so for
> > each level you pay the random page cost. Of course higher levels in the
> > tree will almost certainly exhibit emergent behavior such that they
> > (just by fact of the LRU caching) will be in the buffer cache, but for a
> > large index lower levels likely won't be.

We've talked before about buffering inserts even just for disk-based
indexes. Much like how GIN buffers inserts and periodically flushes
them out. We talked about doing a local buffer in each session since
no other session even needs to see these buffered inserts until commit
anyways. And we can more efficiently merge in multiple keys at once
than doing them one by one.

But that was just for disk i/o. For something longer-latency it would
be an even bigger win. Buffer the inserted keys in local memory in
case you do lookups in this same session and start the i/o to insert
the rows into the index but handle that in the background or in a
separate process without blocking the transaction until commit.

> What do you consider a large index level?
>
> Consider a 1TB table, with just a single UUID column - that's ~25B rows,
> give or take. Real tables will have more columns, so this seems like a
> reasonable model of the largest number of rows per relation. With ~32B
> per index tuple, that's about 100M leaf pages, and with ~256 branches
> per internal page, that's still only ~5 levels. I think it's quite rare
> to see indexes with more than 6 or 7 levels.

That's a good model for a well-designed schema with an efficient
index. There are plenty of less-than-optimal schemas with indexes on
longer column lists or fairly large text fields....

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: plan with result cache is very slow when work_mem is not enough
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Processing btree walks as a batch to parallelize IO