Parallel CREATE INDEX for BRIN indexes

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Parallel CREATE INDEX for BRIN indexes
Дата	8 июня 2023 г. 12:55:09
Msg-id	c2ee7d69-ce17-43f2-d1a0-9811edbda6e6@enterprisedb.com обсуждение исходный текст
Ответы	Re: Parallel CREATE INDEX for BRIN indexes
Список	pgsql-hackers

Дерево обсуждения

Hi,

Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
infrastructure (starting workers etc.) is "inspired" by the BTREE code
(i.e. copied from that and massaged a bit to call brin stuff).

 _bt_begin_parallel -> _brin_begin_parallel
 _bt_end_parallel -> _brin_end_parallel
 _bt_parallel_estimate_shared -> _brin_parallel_estimate_shared
 _bt_leader_participate_as_worker -> _brin_leader_participate_as_worker
 _bt_parallel_scan_and_sort -> _brin_parallel_scan_and_build

This is mostly mechanical stuff - setting up the parallel workers,
starting the scan etc.

The tricky part is how to divide the work between workers and how we
combine the partial results. For BTREE we simply let each worker to read
a subset of the table (using a parallel scan), sort it and then do a
merge sort on the partial results.

For BRIN it's a bit different, because the indexes essentially splits
the table into smaller ranges and treat them independently. So the
easiest way is to organize the table scan so that each range gets
processed by exactly one worker. Each worker writes the index tuples
into a temporary file, and then when all workers are done we read and
write them into the index.

The problem is a parallel scan assigns mostly random subset of the table
to each worker - it's not guaranteed a BRIN page range to be processed
by a single worker.


0001 does that in a bit silly way - instead of doing single large scan,
each worker does a sequence of TID range scans for each worker (see
_brin_parallel_scan_and_build), and BrinShared has fields used to track
which ranges were already assigned to workers. A bit cumbersome, but it
works pretty well.

0002 replaces the TID range scan sequence with a single parallel scan,
modified to assign "chunks" in multiple of pagesPerRange.


In both cases _brin_end_parallel then reads the summaries from worker
files, and adds them into the index. In 0001 this is fairly simple,
although we could do one more improvement and sort the ranges by range
start to make the index nicer (and possibly a bit more efficient). This
should be simple, because the per-worker results are already sorted like
that (so a merge sort in _brin_end_parallel would be enough).

For 0002 it's a bit more complicated, because with a single parallel
scan brinbuildCallbackParallel can't decide if a range is assigned to a
different worker or empty. And we want to generate summaries for empty
ranges in the index. We could either skip such range during index build,
and then add empty summaries in _brin_end_parallel (if needed), or add
them and then merge them using "union".


I just realized there's a third option to do this - we could just do
regular parallel scan (with no particular regard to pagesPerRange), and
then do "union" when merging results from workers. It doesn't require
the sequence of TID scans, and the union would also handle the empty
ranges. The per-worker results might be much larger, though, because
each worker might produce up to the "full" BRIN index.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Parallel CREATE INDEX for BRIN indexes

Вложения