Re: pg_dump and thousands of schemas

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: pg_dump and thousands of schemas
Дата
Msg-id CAMkU=1yjA-xQXnFTQVWa5jq-JPXyRdm5RXQQtADDm4T7vKhHoA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_dump and thousands of schemas  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: pg_dump and thousands of schemas  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: pg_dump and thousands of schemas  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On Thu, May 24, 2012 at 8:54 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, May 24, 2012 at 08:20:34PM -0700, Jeff Janes wrote:

>> pg_dump itself seems to have a lot of quadratic portions (plus another
>> one on the server which it hits pretty heavily), and it hard to know
>> where to start addressing them.  It seems like addressing the overall
>> quadratic nature might be a globally better option, but addressing
>> just the problem with dumping one schema might be easier to kluge
>> together.
>
> Postgres 9.2 will have some speedups for pg_dump scanning large
> databases --- that might help.

Those speed ups don't seem to apply here, though.  I get the same
performance in 9.0.7 as 9.2.beta1.

There is an operation in pg_dump which is O(#_of_schemata_in_db *
#_of_table_in_db), or something like that.

The attached very crude patch reduces that to
O(log_of_#_of_schemata_in_db * #_of_table_in_db)

I was hoping this would be a general improvement.  It doesn't seem be.
 But it is a very substantial improvement in the specific case of
dumping one small schema out of a very large database.

It seems like dumping one schema would be better optimized by not
loading up the entire database catalog, but rather by restricting to
just that schema at the catalog stage.  But I haven't dug into those
details.

For dumping entire databases, It looks like the biggest problem is
going to be LockReassignCurrentOwner in the server.  And that doesn't
seem to be easy to fix, as any change to it to improve pg_dump will
risk degrading normal use cases.

If we want to be able to efficiently dump entire databases in a
scalable way, it seems like there should be some way to obtain a
data-base-wide AccessShare lock, which blocks AccessExclusive locks on
any object in the database, and turns ordinary object-level
AccessShare lock requests into no-ops.  I don't think you can get
hundreds of thousands of simultaneously held and individually recorded
AccessShare locks without causing bad things to happen.

Cheers,

Jeff

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_dump and thousands of schemas
Следующее
От: Greg Spiegelberg
Дата:
Сообщение: Millions of relations (from Maximum number of sequences that can be created)