Parallel CREATE INDEX for BRIN indexes
От | Tomas Vondra |
---|---|
Тема | Parallel CREATE INDEX for BRIN indexes |
Дата | |
Msg-id | c2ee7d69-ce17-43f2-d1a0-9811edbda6e6@enterprisedb.com обсуждение исходный текст |
Ответы |
Re: Parallel CREATE INDEX for BRIN indexes
|
Список | pgsql-hackers |
Hi, Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The infrastructure (starting workers etc.) is "inspired" by the BTREE code (i.e. copied from that and massaged a bit to call brin stuff). _bt_begin_parallel -> _brin_begin_parallel _bt_end_parallel -> _brin_end_parallel _bt_parallel_estimate_shared -> _brin_parallel_estimate_shared _bt_leader_participate_as_worker -> _brin_leader_participate_as_worker _bt_parallel_scan_and_sort -> _brin_parallel_scan_and_build This is mostly mechanical stuff - setting up the parallel workers, starting the scan etc. The tricky part is how to divide the work between workers and how we combine the partial results. For BTREE we simply let each worker to read a subset of the table (using a parallel scan), sort it and then do a merge sort on the partial results. For BRIN it's a bit different, because the indexes essentially splits the table into smaller ranges and treat them independently. So the easiest way is to organize the table scan so that each range gets processed by exactly one worker. Each worker writes the index tuples into a temporary file, and then when all workers are done we read and write them into the index. The problem is a parallel scan assigns mostly random subset of the table to each worker - it's not guaranteed a BRIN page range to be processed by a single worker. 0001 does that in a bit silly way - instead of doing single large scan, each worker does a sequence of TID range scans for each worker (see _brin_parallel_scan_and_build), and BrinShared has fields used to track which ranges were already assigned to workers. A bit cumbersome, but it works pretty well. 0002 replaces the TID range scan sequence with a single parallel scan, modified to assign "chunks" in multiple of pagesPerRange. In both cases _brin_end_parallel then reads the summaries from worker files, and adds them into the index. In 0001 this is fairly simple, although we could do one more improvement and sort the ranges by range start to make the index nicer (and possibly a bit more efficient). This should be simple, because the per-worker results are already sorted like that (so a merge sort in _brin_end_parallel would be enough). For 0002 it's a bit more complicated, because with a single parallel scan brinbuildCallbackParallel can't decide if a range is assigned to a different worker or empty. And we want to generate summaries for empty ranges in the index. We could either skip such range during index build, and then add empty summaries in _brin_end_parallel (if needed), or add them and then merge them using "union". I just realized there's a third option to do this - we could just do regular parallel scan (with no particular regard to pagesPerRange), and then do "union" when merging results from workers. It doesn't require the sequence of TID scans, and the union would also handle the empty ranges. The per-worker results might be much larger, though, because each worker might produce up to the "full" BRIN index. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: