Обсуждение: pg_dump object sorting
I have been looking at refining the sorting of objects in pg_dump to make it take advantage of buffering and synchronised scanning, and possibly make parallel restoration simpler and more efficient. My first thought was to sort indexes by <namespace, tablename, indexname> instead of by <namespace, indexname>. However, that doesn't go far enough, I think. Is there any reason we can't do all of a table's indexes and non-FK constraints together? Will that affect anything other than PK and UNIQUE constraints, as NULL and CHECK constraints are included in table definitions? cheers andrew
On Mon, 2008-04-14 at 11:18 -0400, Andrew Dunstan wrote: > I have been looking at refining the sorting of objects in pg_dump to > make it take advantage of buffering and synchronised scanning, and > possibly make parallel restoration simpler and more efficient. > Synchronized scanning is explicitly disabled in pg_dump. That was a last-minute change to answer Greg Stark's complaint about dumping a clustered table: http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php That hopefully won't be a permanent solution, because I think synchronized scans are useful for pg_dump. However, I'm not clear on how the pg_dump order would be able to better take advantage of synchronized scans anyway. What did you have in mind? Regards,Jeff Davis
Jeff Davis wrote: > On Mon, 2008-04-14 at 11:18 -0400, Andrew Dunstan wrote: > >> I have been looking at refining the sorting of objects in pg_dump to >> make it take advantage of buffering and synchronised scanning, and >> possibly make parallel restoration simpler and more efficient. >> >> > > Synchronized scanning is explicitly disabled in pg_dump. That was a > last-minute change to answer Greg Stark's complaint about dumping a > clustered table: > > http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php > > That hopefully won't be a permanent solution, because I think > synchronized scans are useful for pg_dump. > > However, I'm not clear on how the pg_dump order would be able to better > take advantage of synchronized scans anyway. What did you have in mind? > > > I should have expressed it better. The idea is to have pg_dump emit the objects in an order that allows the restore to take advantage of sync scans. So sync scans being disabled in pg_dump would not at all matter. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I should have expressed it better. The idea is to have pg_dump emit the > objects in an order that allows the restore to take advantage of sync > scans. So sync scans being disabled in pg_dump would not at all matter. Unless you do something to explicitly parallelize the operations, how will a different ordering improve matters? I thought we had a paper design for this, and it involved teaching pg_restore how to use multiple connections. In that context it's entirely up to pg_restore to manage the ordering and ensure dependencies are met. So I'm not seeing how it helps to have a different sort rule at pg_dump time --- it won't really make pg_restore's task any easier. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I should have expressed it better. The idea is to have pg_dump emit the >> objects in an order that allows the restore to take advantage of sync >> scans. So sync scans being disabled in pg_dump would not at all matter. >> > > Unless you do something to explicitly parallelize the operations, > how will a different ordering improve matters? > > I thought we had a paper design for this, and it involved teaching > pg_restore how to use multiple connections. In that context it's > entirely up to pg_restore to manage the ordering and ensure dependencies > are met. So I'm not seeing how it helps to have a different sort rule > at pg_dump time --- it won't really make pg_restore's task any easier. > > > Well, what actually got me going on this initially was that I got annoyed by having indexes not grouped by table when I dumped out the schema of a database, because it seemed a bit illogical. Then I started thinking about it and it seemed to me that even without synchronised scanning or parallel restoration, we might benefit from building all the indexes of a given table together, especially if the whole table could fit in either our cache or the OS cache. cheers andrew