Re: 8.3 / 8.2.6 restore comparison

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: 8.3 / 8.2.6 restore comparison
Дата	24 февраля 2008 г. 13:32:42
Msg-id	5097.1203874351@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: 8.3 / 8.2.6 restore comparison ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Ответы	Re: 8.3 / 8.2.6 restore comparison
Список	pgsql-hackers

Дерево обсуждения

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> At some point, I think we have to bite the bullet and find a way to use
> multiple CPUs for a single load. I don't have any good ideas or plans
> for that, but hopefully someone does.

As already mentioned upthread, we could do that today, with zero backend
changes, by making pg_restore drive multiple sessions. Now there are
scenarios where this wouldn't help too much --- eg, a database with only
one enormous table. We couldn't parallelize the loading of that table,
although we could parallelize creation of its indexes. But for an
example such as JD is complaining about, we ought to have no trouble
thrashing his disks into the ground ;-)

What you would need for this is an -Fc or -Ft dump, because a plain
script output from pg_dump doesn't carry any dependency information,
much less any index of where in the file different bits of data are.
Just armwaving, I envision a multiprocess pg_restore like this:
* one controller process that tracks the not-yet-loaded TOC items in the dump* N child processes that just transmit a
selectedTOC item to a connected backend, and then ask the controller what to do next

The controller would need to use the dependency information to avoid,
eg, handing out a CREATE INDEX command before the parent table was
created and loaded.

One issue is that this couldn't use "-1" single-transaction restoring,
since obviously each child would need its own transaction, and
furthermore would have to commit before going back to the controller
for more work (since dependent TOC items might well get loaded by
a different child later). That defeats a couple of optimizations that
Simon put in recently. The one for no XLOG during COPY is not too
hard to see how to re-enable, but I'm not sure what else there was.

Most likely, the bottleneck with this sort of thing would be multiple
parallel reads from the pg_dump archive file. Possibly the controller
process could be taught to schedule COPY and CREATE INDEX operations
so that not too many processes are trying to read lots of archive
data at the same time.
regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: 8.3 / 8.2.6 restore comparison