Re: why postgresql over other RDBMS
От | Stefan Kaltenbrunner |
---|---|
Тема | Re: why postgresql over other RDBMS |
Дата | |
Msg-id | 46580CF6.1000909@kaltenbrunner.cc обсуждение исходный текст |
Ответ на | Re: why postgresql over other RDBMS (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: why postgresql over other RDBMS
|
Список | pgsql-general |
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Erik Jones wrote: >>> And, to finish up, is there any reason that pg_restore couldn't >>> already work with separate processes working in parallel? > >> The problem is that the ordering of objects in the dump is the only >> thing that makes the dump consistent with regards to the dependencies of >> objects. So pg_restore cannot make any assumptions of parallelisability >> of the restoration process of objects in the dump. > > That's true at the level of DDL operations, but AFAIK we could > parallelize table-loading and index-creation steps pretty effectively > --- and that's where all the time goes. yes loading the data and creating the index is the most time consuming part of a large dump and reload cycle. > > A more interesting question is what sort of hardware you need for that > actually to be a win, though. Loading a few tables in parallel sounds > like an ideal recipe for oversaturating your disk bandwidth... you don't actually need that much of disk bandwidth both COPY and CREATE INDEX are CPU bottlenecked on modern boxes and reasonable disk subsystems - spreading their work over multiple cores/processes can give big benefits. For example I have managed to load ~2B rows (5 integer columns - no indexes) at a rate of about 320000 rows/s on a modern(but already 1,5 years old) 4 core 2.6Ghz Opteron box (with 12 disks and BBWC iirc). Using 4 concurrent processes to load the data resulted in about 930000 rows/s loaded (hitting the disk-io limit at that rate). So having the ability to parallelize those operations at both the dump and the restore level would be a huge win. A manual experiment I did a while back with doing that by hand (ie. splitting the dump manually and feeding it in parallel with a concurrency of 2) on a copy of a production database brought down the restore time from 3h+ to a bit less than 2 hours. Stefan
В списке pgsql-general по дате отправления: