Re: parallel pg_restore blocks on heavy random read I/O on all children processes
От | Hannu Krosing |
---|---|
Тема | Re: parallel pg_restore blocks on heavy random read I/O on all children processes |
Дата | |
Msg-id | CAMT0RQTz7Zi99C66U2160Mmcxj+fBJ0OpD6Eq=aLZsdYaFZwBg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: parallel pg_restore blocks on heavy random read I/O on all children processes (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-performance |
You may be interested in a patch "Adding pg_dump flag for parallel export to pipes"[1] which allows using pipes in directory former parallel dump and restore. There the offsets are implicitly taken care of by the file system. [1] https://www.postgresql.org/message-id/CAH5HC97p4kkpikar%2BswuC0Lx4YTVkE30sTsFX94tyzih7Cc_%3Dw%40mail.gmail.com On Sun, Mar 23, 2025 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Dimitrios Apostolou <jimis@gmx.net> writes: > > On Thu, 20 Mar 2025, Tom Lane wrote: > >> I am betting that the problem is that the dump's TOC (table of > >> contents) lacks offsets to the actual data of the database objects, > >> and thus the readers have to reconstruct that information by scanning > >> the dump file. Normally, pg_dump will back-fill offset data in the > >> TOC at completion of the dump, but if it's told to write to an > >> un-seekable output file then it cannot do that. > > > Further questions: > > > * Does the same happen in an uncompressed dump? Or maybe the offsets are > > pre-filled because they are predictable without compression? > > Yes; no. We don't know the size of a table's data as-dumped until > we've dumped it. > > > * Should pg_dump print some warning for generating a lower quality format? > > I don't think so. In many use-cases this is irrelevant and the > warning would just be an annoyance. > > > * The seeking pattern in pg_restore seems non-sensical to me: reading 4K, > > jumping 8-12K, repeat for the whole file? Consuming 15K IOPS for an > > hour. /Maybe/ something to improve there... Where can I read more about > > the format? > > It's reading data blocks (or at least the headers thereof), which have > a limited size. I don't think that size has changed since circa 1999, > so maybe we could consider increasing it; but I doubt we could move > the needle very far that way. > > > * Why doesn't it happen in single-process pg_restore? > > A single-process restore is going to restore all the data in the order > it appears in the archive file, so no seeking is required. Of course, > as soon as you ask for parallelism, that doesn't work too well. > > Hypothetically, maybe the algorithm for handing out tables-to-restore > to parallel workers could pay attention to the distance to the data > ... except that in the problematic case we don't have that > information. I don't recall for sure, but I think that the order of > the TOC entries is not necessarily a usable proxy for the order of the > data entries. It's unclear to me that overriding the existing > heuristic (biggest tables first, I think) would be a win anyway. > > regards, tom lane > >
В списке pgsql-performance по дате отправления: