Re: Batch update of indexes on data loading

Поиск
Список
Период
Сортировка
От ITAGAKI Takahiro
Тема Re: Batch update of indexes on data loading
Дата
Msg-id 20080226135216.60CF.52131E4D@oss.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Batch update of indexes on data loading  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: Batch update of indexes on data loading
Список pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> wrote:

> One of the reasons why I hadn't wanted to pursue earlier ideas to use
> LOCK was that applying a lock will prevent running in parallel, which
> ultimately may prevent further performance gains.
> 
> Is there a way of doing this that will allow multiple concurrent COPYs?

I think there is same difficulty as parallel queries. It requires tighter
communication among COPY threads whether we will use multi-process model
or multi-thread model.

We have independent concurrent COPYs now; COPYs are not aware of each
other because no intermediate status during COPY. However, COPY will
have "phases" if we use bulkbuild. Therefore, we will need joining
COPY threads and passing each working memories between threads.

Here is a possible multi-threaded workload:
 A. For each row:     1. Parsing new coming data     2. Add the row into the heap.     3. Spool index entries to each
indexspooler. B. Wait for all threads. C. Merge spools and corresponding existing indexes into new ones.
 

Phase A could be concurrently as same as now. A1 and A2 are independent
jobs. We could have shared spooler or per-thread spooler.
Phase B is needed to build indexes at once, or it will be double work.
Phase C could be concurrently for each indexes. A thread is responsible
to build one index. It merges the existing index and one shared spool
or multiple spools if we use per-thread spooler.

One of the issues is how to pass or share spoolers between COPY threads.
Another is how to make it transaction safe. If one of the thread fails to
build its index, all thread should be rollback.
I'm not sure how to do them...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pg_dump additional options for performance
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: 8.3 / 8.2.6 restore comparison