Re: Experimenting with hash tables inside pg_dump
От | Andres Freund |
---|---|
Тема | Re: Experimenting with hash tables inside pg_dump |
Дата | |
Msg-id | 20211022055939.z6fihsm7hdzbjttf@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Experimenting with hash tables inside pg_dump (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Experimenting with hash tables inside pg_dump
|
Список | pgsql-hackers |
Hi, On 2021-10-21 22:13:22 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I wonder though if for some of them we should instead replace the per-object > > queries with one query returning the information for all objects of a type. It > > doesn't make all that much sense that we build and send one query for each > > table and index. > > The trick is the problem I alluded to in another thread: it's not safe to > do stuff like pg_get_expr() on tables we don't have lock on. I was looking at getTableAttrs() - sending one query instead of #tables queries yields a quite substantial speedup in a quick prototype. And I don't think it changes anything around locking semantics. > I've thought about doing something like > > SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...) > > but in cases with tens of thousands of tables, it seems unlikely that > that's going to behave all that nicely. That's kinda what I'm doing in the quick hack. But instead of using IN(...) I made it unnest('{oid, oid, ...}'), that scales much better. A pg_dump --schema-only of the regression database goes from real 0m0.675s user 0m0.039s sys 0m0.029s to real 0m0.477s user 0m0.037s sys 0m0.020s which isn't half-bad. There's a few more cases like this I think. But most are harder because the dumping happens one-by-one from dumpDumpableObject(). The relatively easy but substantial cases I could find quickly were getIndexes(), getConstraints(), getTriggers() To see where it's worth putting in time it'd be useful if getSchemaData() in verbose mode printed timing information... > The *real* fix, I suppose, would be to fix all those catalog-inspection > functions so that they operate with respect to the query's snapshot. > But that's not a job I'm volunteering for. Besides which, pg_dump > still has to cope with back-rev servers where it wouldn't be safe. Yea, that's not a small change :(. I suspect that we'd need a bunch of new caching infrastructure to make that reasonably performant, since this presumably couldn't use syscache etc. Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: