Re: Parallel CREATE INDEX for BRIN indexes
От | Tomas Vondra |
---|---|
Тема | Re: Parallel CREATE INDEX for BRIN indexes |
Дата | |
Msg-id | 2d7c64de-716e-346e-b01e-03db0c2bc5ac@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Parallel CREATE INDEX for BRIN indexes (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Ответы |
Re: Parallel CREATE INDEX for BRIN indexes
|
Список | pgsql-hackers |
On 7/5/23 16:33, Matthias van de Meent wrote: > ... > >> Maybe. I wasn't that familiar with what parallel tuplesort can and can't >> do, and the little I knew I managed to forget since I wrote this patch. >> Which similar features do you have in mind? > > I was referring to the feature that is "emitting a single sorted run > of tuples at the leader backend based on data gathered in parallel > worker backends". It manages the sort state, on-disk runs etc. so that > you don't have to manage that yourself. > > Adding a new storage format for what is effectively a logical tape > (logtape.{c,h}) and manually merging it seems like a lot of changes if > that functionality is readily available, standardized and optimized in > sortsupport; and adds an additional place to manually go through for > disk-related changes like TDE. > Here's a new version of the patch, with three main changes: 1) Adoption of the parallel scan approach, instead of the homegrown solution with a sequence of TID scans. This is mostly what the 0002 patch did, except for fixing a bug - parallel scan has a "rampdown" close to the end, and this needs to consider the chunk size too. 2) Switches to the parallel tuplesort, as proposed. This turned out to be easier than I expected - most of the work was in adding methods to tuplesortvariants.c to allow reading/writing BrinTuple items. The main limitation is that we need to pass around the length of the tuple (AFAICS it's not in the BrinTuple itself). I'm not entirely sure about the memory management aspect of this, and maybe there's a more elegant solution. Overall it seems to work - the brin.c code is heavily based on how nbtsearch.c does parallel builds for btree, so hopefully it's fine. At some point I got a bit confused about which spool to create/use, but it seems to work. 3) Handling of empty ranges - I ended up ignoring empty ranges in workers (i.e. those are not written to the tuplesort), and instead the leader fills them in when reading data from the shared tuplesort. One thing I was wondering about is whether it might be better to allow the workers to process overlapping ranges, and then let the leader to merge the summaries. That would mean we might not need the tableam.c changes at all, but the leader would need to do more work (although the BRIN indexes tend to be fairly small). The main reason that got me thinking about this is that we have pretty much no tests for the union procedures, because triggering that is really difficult. But for parallel index builds that'd be much more common. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: