Re: Benchmark Data requested
От | Heikki Linnakangas |
---|---|
Тема | Re: Benchmark Data requested |
Дата | |
Msg-id | 47A8D910.1030607@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Benchmark Data requested ("Jignesh K. Shah" <J.K.Shah@Sun.COM>) |
Список | pgsql-performance |
Jignesh K. Shah wrote: > Is there a way such an operation can be spawned as a worker process? > Generally during such loading - which most people will do during > "offpeak" hours I expect additional CPU resources available. By > delegating such additional work to worker processes, we should be able > to capitalize on additional cores in the system. Hmm. You do need access to shared memory, locks, catalogs, and to run functions etc, so I don't think it's significantly easier than using multiple cores for COPY itself. > Even if it is a single core, the mere fact that the loading process will > eventually wait for a read from the input file which cannot be > non-blocking, the OS can timeslice it well for the second process to use > those wait times for the index population work. That's an interesting point. > What do you think? > > > Regards, > Jignesh > > > Heikki Linnakangas wrote: >> Dimitri Fontaine wrote: >>> Le mardi 05 février 2008, Simon Riggs a écrit : >>>> I'll look at COPY FROM internals to make this faster. I'm looking at >>>> this now to refresh my memory; I already had some plans on the shelf. >>> >>> Maybe stealing some ideas from pg_bulkload could somewhat help here? >>> >>> http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf >>> >>> >>> IIRC it's mainly about how to optimize index updating while loading >>> data, and I've heard complaints on the line "this external tool has >>> to know too much about PostgreSQL internals to be trustworthy as >>> non-core code"... so... >> >> I've been thinking of looking into that as well. The basic trick >> pg_bulkload is using is to populate the index as the data is being >> loaded. There's no fundamental reason why we couldn't do that >> internally in COPY. Triggers or constraints that access the table >> being loaded would make it impossible, but we should be able to detect >> that and fall back to what we have now. >> >> What I'm basically thinking about is to modify the indexam API of >> building a new index, so that COPY would feed the tuples to the >> indexam, instead of the indexam opening and scanning the heap. The >> b-tree indexam would spool the tuples into a tuplesort as the COPY >> progresses, and build the index from that at the end as usual. >> -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-performance по дате отправления: