Re: Parallel CREATE INDEX for GIN indexes
От | Andy Fan |
---|---|
Тема | Re: Parallel CREATE INDEX for GIN indexes |
Дата | |
Msg-id | 87y18ektdn.fsf@163.com обсуждение исходный текст |
Ответ на | Re: Parallel CREATE INDEX for GIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: Parallel CREATE INDEX for GIN indexes
(Tomas Vondra <tomas.vondra@enterprisedb.com>)
|
Список | pgsql-hackers |
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >>> 7) v20240502-0007-Detect-wrap-around-in-parallel-callback.patch >>> >>> There's one more efficiency problem - the parallel scans are required to >>> be synchronized, i.e. the scan may start half-way through the table, and >>> then wrap around. Which however means the TID list will have a very wide >>> range of TID values, essentially the min and max of for the key. >> >> I have two questions here and both of them are generall gin index questions >> rather than the patch here. >> >> 1. What does the "wrap around" mean in the "the scan may start half-way >> through the table, and then wrap around". Searching "wrap" in >> gin/README gets nothing. >> > > The "wrap around" is about the scan used to read data from the table > when building the index. A "sync scan" may start e.g. at TID (1000,0) > and read till the end of the table, and then wraps and returns the > remaining part at the beginning of the table for blocks 0-999. > > This means the callback would not see a monotonically increasing > sequence of TIDs. > > Which is why the serial build disables sync scans, allowing simply > appending values to the sorted list, and even with regular flushes of > data into the index we can simply append data to the posting lists. Thanks for the hints, I know the sync strategy comes from syncscan.c now. >>> Without 0006 this would cause frequent failures of the index build, with >>> the error I already mentioned: >>> >>> ERROR: could not split GIN page; all old items didn't fit >> 2. I can't understand the below error. >> >>> ERROR: could not split GIN page; all old items didn't fit > if (!append || ItemPointerCompare(&maxOldItem, &remaining) >= 0) > elog(ERROR, "could not split GIN page; all old items didn't fit"); > > It can fail simply because of the !append part. Got it, Thanks! >> If we split the blocks among worker 1-block by 1-block, we will have a >> serious issue like here. If we can have N-block by N-block, and N-block >> is somehow fill the work_mem which makes the dedicated temp file, we >> can make things much better, can we? > I don't understand the question. The blocks are distributed to workers > by the parallel table scan, and it certainly does not do that block by > block. But even it it did, that's not a problem for this code. OK, I get ParallelBlockTableScanWorkerData.phsw_chunk_size is designed for this. > The problem is that if the scan wraps around, then one of the TID lists > for a given worker will have the min TID and max TID, so it will overlap > with every other TID list for the same key in that worker. And when the > worker does the merging, this list will force a "full" merge sort for > all TID lists (for that key), which is very expensive. OK. Thanks for all the answers, they are pretty instructive! -- Best Regards Andy Fan
В списке pgsql-hackers по дате отправления:
Следующее
От: Daniel GustafssonДата:
Сообщение: Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid