Re: why postgresql over other RDBMS

Поиск

Список

Период

Сортировка

От	Stefan Kaltenbrunner
Тема	Re: why postgresql over other RDBMS
Дата	26 мая 2007 г. 07:33:49
Msg-id	46580CF6.1000909@kaltenbrunner.cc обсуждение исходный текст
Ответ на	Re: why postgresql over other RDBMS (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: why postgresql over other RDBMS
Список	pgsql-general

Дерево обсуждения

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Erik Jones wrote:
>>> And, to finish up, is there any reason that pg_restore couldn't
>>> already work with separate processes working in parallel?
>
>> The problem is that the ordering of objects in the dump is the only
>> thing that makes the dump consistent with regards to the dependencies of
>> objects.  So pg_restore cannot make any assumptions of parallelisability
>> of the restoration process of objects in the dump.
>
> That's true at the level of DDL operations, but AFAIK we could
> parallelize table-loading and index-creation steps pretty effectively
> --- and that's where all the time goes.

yes loading the data and creating the index is the most time consuming
part of a large dump and reload cycle.

>
> A more interesting question is what sort of hardware you need for that
> actually to be a win, though.  Loading a few tables in parallel sounds
> like an ideal recipe for oversaturating your disk bandwidth...

you don't actually need that much of disk bandwidth both COPY and CREATE
INDEX are CPU bottlenecked on modern boxes and reasonable disk
subsystems - spreading their work over multiple cores/processes can give
big benefits.
For example I have managed to load ~2B rows (5 integer columns - no
indexes) at a rate of about 320000 rows/s on a modern(but already 1,5
years old) 4 core 2.6Ghz Opteron box (with 12 disks and BBWC iirc).
Using 4 concurrent processes to load the data resulted in about 930000
rows/s loaded (hitting the disk-io limit at that rate).
So having the ability to parallelize those operations at both the dump
and the restore level would be a huge win.
A manual experiment I did a while back with doing that by hand (ie.
splitting the dump manually and feeding it in parallel with a
concurrency of 2)  on a copy of a production database brought down the
restore time from 3h+ to a bit less than 2 hours.

Stefan

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: why postgresql over other RDBMS