Batch update of indexes on data loading

Поиск
Список
Период
Сортировка
От ITAGAKI Takahiro
Тема Batch update of indexes on data loading
Дата
Msg-id 20080221130554.B2BA.52131E4D@oss.ntt.co.jp
обсуждение исходный текст
Ответы Re: Batch update of indexes on data loading  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Batch update of indexes on data loading  (Simon Riggs <simon@2ndquadrant.com>)
Re: Batch update of indexes on data loading  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
This is a proposal of fast data loading using batch update of indexes for 8.4.
It is a part of pg_bulkload (http://pgbulkload.projects.postgresql.org/) and
I'd like to integrate it in order to cooperate with other parts of postgres.

The basic concept is spooling new coming data, and merge the spool and
the existing indexes into a new index at the end of data loading. It is 
5-10 times faster than index insertion per-row, that is the way in 8.3.


One of the problem is locking; Index building in bulkload is similar to
REINDEX rather than INSERT, so we need ACCESS EXCLUSIVE LOCK during it.
Bulkloading is not a upper compatible method, so I'm thinking about
adding a new "WITH LOCK" option for COPY command.
 COPY tbl FROM 'datafile' WITH LOCK;

If the LOCK option is specified, the behavior of COPY will be changed
as follows:

1. Lock the target table in ACCESS EXCLUSIVE mode instead of ROW EXCLUSIVE.
2. Prepare spooler (BTSpool) for each indexes.
3. For each new row, put index entries into the spools (_bt_spool)  instead of index_insert.
4. At the end of COPY, merge the spool and the existing indexes into a new  index file. The relfilenode of the index is
changedlike REINDEX.
 

However, there might be better interfaces for bulk index creation.
For example, if we want to use it with pgloader, we might need
"bulkload mode" for indexes. pgloader commits every 10000 rows,
so the index spooler must keep alive until end of the session
over transactions. (or end of the transaction over sub-transactions)

I'm working toward the simple "COPY WITH LOCK" approach for now,
but if there are other better ideas, I want to use them.
Advices and suggestions welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: Permanent settings
Следующее
От: ITAGAKI Takahiro
Дата:
Сообщение: Re: ANALYZE to be ignored by VACUUM