Re: Parallel index build during COPY

Поиск

Список

Период

Сортировка

От	Toru SHIMOGAKI
Тема	Re: Parallel index build during COPY
Дата	15 июня 2006 г. 23:46:05
Msg-id	44921B51.5050103@oss.ntt.co.jp обсуждение исходный текст
Ответ на	Parallel index build during COPY ("Jim C. Nasby" <jnasby@pervasive.com>)
Ответы	Re: Parallel index build during COPY
Список	pgsql-hackers

Дерево обсуждения

NTT has some ideas about index creation during a large amount of data loading. 
Our approach is the following: index tuples are created at the same time as heap 
tuples and added into heapsort. In addition, we use old index tuples as sorted 
list if the target table has already data. It is not necessary for data loader 
to sort all the index tuples including old ones. After only new index tuples are 
sorted, both sorted lists are merged and the whole index is built. It can save 
both CPU resources and disk accesses dramatically, especially if the target 
table has already so many tuples.
This approach needs to acquire a table lock, which is unlike COPY's lock mode, 
so we have developed it as another bulk load tool. We will talk about it in 
PostgreSQL Anniversary Conference at Toronto. Thank you for Josh’s coordination.

Best regards,

Jim C. Nasby wrote:
> It's not uncommon for index creation to take a substantial amount of
> time for loading data, even when using the 'trick' of loading the data
> before building the indexes. On fast RAID arrays, it's also possible for
> this to be a CPU-bound operation, so I've been wondering if there was
> some reasonable way to parallelize it in the context of a restore from
> pg_dump. Needless to say, that's a non-trivial proposition.
> 
> But the thought occured to me: why read from the table we just loaded
> multiple times to create the indexes on it? If we're loading into an
> empty table, we could feed newly created pages (or tuples) into sort
> processes, one for each index. After the entire table is loaded, each
> sort could then be finalized, and the appropriate index written out.
> It's unclear if this would be a win on a small table, but not needing to
> make multiple read passes over a large table would almost certainly be a
> win.
> 
> If someone wants to hack up a patch to allow testing this, I can get
> some benchmark numbers.

-- 
Toru SHIMOGAKI
NTT Opensource Software Center <shimogaki.toru@oss.ntt.co.jp>

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel index build during COPY