Re: Benchmark Data requested
От | Jignesh K. Shah |
---|---|
Тема | Re: Benchmark Data requested |
Дата | |
Msg-id | 47A8CC1B.1050000@sun.com обсуждение исходный текст |
Ответ на | Re: Benchmark Data requested ("Heikki Linnakangas" <heikki@enterprisedb.com>) |
Ответы |
Re: Benchmark Data requested
Re: Benchmark Data requested |
Список | pgsql-performance |
Hi Heikki, Is there a way such an operation can be spawned as a worker process? Generally during such loading - which most people will do during "offpeak" hours I expect additional CPU resources available. By delegating such additional work to worker processes, we should be able to capitalize on additional cores in the system. Even if it is a single core, the mere fact that the loading process will eventually wait for a read from the input file which cannot be non-blocking, the OS can timeslice it well for the second process to use those wait times for the index population work. What do you think? Regards, Jignesh Heikki Linnakangas wrote: > Dimitri Fontaine wrote: >> Le mardi 05 février 2008, Simon Riggs a écrit : >>> I'll look at COPY FROM internals to make this faster. I'm looking at >>> this now to refresh my memory; I already had some plans on the shelf. >> >> Maybe stealing some ideas from pg_bulkload could somewhat help here? >> >> http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf >> >> >> IIRC it's mainly about how to optimize index updating while loading >> data, and I've heard complaints on the line "this external tool has >> to know too much about PostgreSQL internals to be trustworthy as >> non-core code"... so... > > I've been thinking of looking into that as well. The basic trick > pg_bulkload is using is to populate the index as the data is being > loaded. There's no fundamental reason why we couldn't do that > internally in COPY. Triggers or constraints that access the table > being loaded would make it impossible, but we should be able to detect > that and fall back to what we have now. > > What I'm basically thinking about is to modify the indexam API of > building a new index, so that COPY would feed the tuples to the > indexam, instead of the indexam opening and scanning the heap. The > b-tree indexam would spool the tuples into a tuplesort as the COPY > progresses, and build the index from that at the end as usual. >
В списке pgsql-performance по дате отправления: