Обсуждение: Improvements in pg_dump/pg_restore toc format and performances
Hi Following the thread "Inefficiency in parallel pg_restore with many tables", I started digging into why the toc.dat files are that big and where time is spent when parsing them. I ended up writing several patches that shaved some time for pg_restore -l, and reduced the toc.dat size. First patch is "finishing" the job of removing has oids support. When this support was removed, instead of dropping the field from the dumps and increasing the dump versions, the field was kept as is. This field stores a boolean as a string, "true" or "false". This is not free, and requires 10 bytes per toc entry. The second patch removes calls to sscanf and replaces them with strtoul. This was the biggest speedup for pg_restore -l. The third patch changes the dump format further to remove these strtoul calls and store the integers as is instead. The fourth patch is dirtier and does more changes to the dump format. Instead of storing the owner, tablespace, table access method and schema of each object as a string, pg_dump builds an array of these, stores them at the beginning of the file and replaces the strings with integer fields in the dump. This reduces the file size further, and removes a lot of calls to ReadStr, thus saving quite some time. Toc has 453999 entries. Patch Toc size Dump -s duration pg_restore -l duration HEAD 214M 23.1s 1.27s #1 (has oid) 210M 22.9s 1.26s #2 (scanf) 210M 22.9s 1.07s #3 (no strtoul) 202M 22.8s 0.94s #4 (string list) 181M 23.1s 0.87s Patch four is likely to require more changes. I don't know PostgreSQL code enough to do better than calling pgmalloc/pgrealloc and maintaining a char** manually, I guess there are structs and functions that do that in a better way. And the location of string tables in the file and in the structures is probably not acceptable, I suppose these should go to the toc header instead. I still submit these for comments and first review. Best regards Pierre Ducroquet
Вложения
On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote: > I ended up writing several patches that shaved some time for pg_restore -l, > and reduced the toc.dat size. I've only just started taking a look at these patches, and I intend to do a more thorough review in the hopefully-not-too-distant future. > First patch is "finishing" the job of removing has oids support. When this > support was removed, instead of dropping the field from the dumps and > increasing the dump versions, the field was kept as is. This field stores a > boolean as a string, "true" or "false". This is not free, and requires 10 > bytes per toc entry. This sounds reasonable to me. I wonder why this wasn't done when WITH OIDS was removed in v12. > The second patch removes calls to sscanf and replaces them with strtoul. This > was the biggest speedup for pg_restore -l. Nice. > The third patch changes the dump format further to remove these strtoul calls > and store the integers as is instead. Do we need to worry about endianness here? > The fourth patch is dirtier and does more changes to the dump format. Instead > of storing the owner, tablespace, table access method and schema of each > object as a string, pg_dump builds an array of these, stores them at the > beginning of the file and replaces the strings with integer fields in the dump. > This reduces the file size further, and removes a lot of calls to ReadStr, thus > saving quite some time. This sounds promising. > Patch Toc size Dump -s duration pg_restore -l duration > HEAD 214M 23.1s 1.27s > #1 (has oid) 210M 22.9s 1.26s > #2 (scanf) 210M 22.9s 1.07s > #3 (no strtoul) 202M 22.8s 0.94s > #4 (string list) 181M 23.1s 0.87s At a glance, the size improvements in 0004 look the most interesting to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote: > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote: >> I ended up writing several patches that shaved some time for pg_restore -l, >> and reduced the toc.dat size. > > I've only just started taking a look at these patches, and I intend to do a > more thorough review in the hopefully-not-too-distant future. Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this to waiting-on-author. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Monday, September 18, 2023 11:52:47 PM CEST Nathan Bossart wrote: > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote: > > I ended up writing several patches that shaved some time for pg_restore > > -l, > > and reduced the toc.dat size. > > I've only just started taking a look at these patches, and I intend to do a > more thorough review in the hopefully-not-too-distant future. Thank you very much. > Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this > to waiting-on-author. I did not notice anything running meson test -v, I'll look further into it in the next days. > > First patch is "finishing" the job of removing has oids support. When this > > support was removed, instead of dropping the field from the dumps and > > increasing the dump versions, the field was kept as is. This field stores > > a > > boolean as a string, "true" or "false". This is not free, and requires 10 > > bytes per toc entry. > > This sounds reasonable to me. I wonder why this wasn't done when WITH OIDS > was removed in v12. I suppose it is an oversight, or not wanting to increase the dump version for no reason. > > The second patch removes calls to sscanf and replaces them with strtoul. > > This was the biggest speedup for pg_restore -l. > > Nice. > > > The third patch changes the dump format further to remove these strtoul > > calls and store the integers as is instead. > > Do we need to worry about endianness here? I used the ReadInt/WriteInt functions already defined in pg_dump that take care of this issue, so there should be no need to worry. > > The fourth patch is dirtier and does more changes to the dump format. > > Instead of storing the owner, tablespace, table access method and schema > > of each object as a string, pg_dump builds an array of these, stores them > > at the beginning of the file and replaces the strings with integer fields > > in the dump. This reduces the file size further, and removes a lot of > > calls to ReadStr, thus saving quite some time. > > This sounds promising. > > > Patch Toc size Dump -s duration pg_restore -l duration > > HEAD 214M 23.1s 1.27s > > #1 (has oid) 210M 22.9s 1.26s > > #2 (scanf) 210M 22.9s 1.07s > > #3 (no strtoul) 202M 22.8s 0.94s > > #4 (string list) 181M 23.1s 0.87s > > At a glance, the size improvements in 0004 look the most interesting to me. Yes it is, and the speed benefits are interesting too (at least for my usecase)
On Monday, September 18, 2023 11:54:42 PM CEST Nathan Bossart wrote: > On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote: > > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote: > >> I ended up writing several patches that shaved some time for pg_restore > >> -l, > >> and reduced the toc.dat size. > > > > I've only just started taking a look at these patches, and I intend to do > > a > > more thorough review in the hopefully-not-too-distant future. > > Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this > to waiting-on-author. FYI, the failures are related to patch 0004, I said it was dirty, but it was apparently an understatement. Patches 0001 to 0003 don't exhibit any regression.
On Monday, September 18, 2023 11:54:42 PM CEST Nathan Bossart wrote: > On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote: > > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote: > >> I ended up writing several patches that shaved some time for pg_restore > >> -l, > >> and reduced the toc.dat size. > > > > I've only just started taking a look at these patches, and I intend to do > > a > > more thorough review in the hopefully-not-too-distant future. > > Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this > to waiting-on-author. Attached updated patches fix this regression, I'm sorry I missed that.
Вложения
On Tue, Sep 19, 2023 at 12:15:55PM +0200, Pierre Ducroquet wrote: > Attached updated patches fix this regression, I'm sorry I missed that. Thanks for the new patches. 0001 and 0002 look reasonable to me. This is a nitpick, but we might want to use atooid() in 0002, which is just shorthand for the strtoul() call you are using. > - WriteStr(AH, NULL); /* Terminate List */ > + WriteInt(AH, -1); /* Terminate List */ I think we need to be cautious about using WriteInt and ReadInt here. OIDs are unsigned, so we probably want to use InvalidOid (0) instead. > + if (AH->version >= K_VERS_1_16) > { > - depSize *= 2; > - deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize); > + depId = ReadInt(AH); > + if (depId == -1) > + break; /* end of list */ > + if (depIdx >= depSize) > + { > + depSize *= 2; > + deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize); > + } > + deps[depIdx] = depId; > + } > + else > + { > + tmp = ReadStr(AH); > + if (!tmp) > + break; /* end of list */ > + if (depIdx >= depSize) > + { > + depSize *= 2; > + deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize); > + } > + deps[depIdx] = strtoul(tmp, NULL, 10); > + free(tmp); > } I would suggest making the resizing logic common. if (AH->version >= K_VERS_1_16) { depId = ReadInt(AH); if (depId == InvalidOid) break; /* end of list */ } else { tmp = ReadStr(AH); if (!tmp) break; /* end of list */ depId = strtoul(tmp, NULL, 10); free(tmp); } if (depIdx >= depSize) { depSize *= 2; deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize); } deps[depIdx] = depId; Also, can we make depId more locally scoped? I have yet to look into 0004 still. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, 19 Sept 2023 at 15:46, Pierre Ducroquet <p.psql@pinaraf.info> wrote: > > On Monday, September 18, 2023 11:54:42 PM CEST Nathan Bossart wrote: > > On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote: > > > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote: > > >> I ended up writing several patches that shaved some time for pg_restore > > >> -l, > > >> and reduced the toc.dat size. > > > > > > I've only just started taking a look at these patches, and I intend to do > > > a > > > more thorough review in the hopefully-not-too-distant future. > > > > Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this > > to waiting-on-author. > > Attached updated patches fix this regression, I'm sorry I missed that. Few comments: 1) These printf statements are not required: + /* write the list of am */ + printf("%d tableams to save\n", tableam_count); + WriteInt(AH, tableam_count); + for (i = 0 ; i < tableam_count ; i++) + { + printf("%d is %s\n", i, tableams[i]); + WriteStr(AH, tableams[i]); + } + + /* write the list of namespaces */ + printf("%d namespaces to save\n", namespace_count); + WriteInt(AH, namespace_count); + for (i = 0 ; i < namespace_count ; i++) + { + printf("%d is %s\n", i, namespaces[i]); + WriteStr(AH, namespaces[i]); + } + + /* write the list of owners */ + printf("%d owners to save\n", owner_count); 2) We generally use pg_malloc in client tools, we should change palloc to pg_malloc: + /* prepare dynamic arrays */ + tableams = palloc(sizeof(char*) * 1); + tableams[0] = NULL; + tableam_count = 0; + namespaces = palloc(sizeof(char*) * 1); + namespaces[0] = NULL; + namespace_count = 0; + owners = palloc(sizeof(char*) * 1); + owners[0] = NULL; + owner_count = 0; + tablespaces = palloc(sizeof(char*) * 1); 3) This similar code is repeated few times, will it be possible to do it in a common function: + if (namespace_count < 128) + { + te->namespace_idx = AH->ReadBytePtr(AH); + invalid_entry = 255; + } + else + { + te->namespace_idx = ReadInt(AH); + invalid_entry = -1; + } + if (te->namespace_idx == invalid_entry) + te->namespace = ""; + else + te->namespace = namespaces[te->namespace_idx]; 4) Can the initialization of tableam_count, namespace_count, owner_count and tablespace_count be done at declaration and these initialization code can be removed: + /* prepare dynamic arrays */ + tableams = palloc(sizeof(char*) * 1); + tableams[0] = NULL; + tableam_count = 0; + namespaces = palloc(sizeof(char*) * 1); + namespaces[0] = NULL; + namespace_count = 0; + owners = palloc(sizeof(char*) * 1); + owners[0] = NULL; + owner_count = 0; + tablespaces = palloc(sizeof(char*) * 1); + tablespaces[0] = NULL; + tablespace_count = 0; 4) There are some whitespace issues in the patch: Applying: move static strings to arrays at beginning .git/rebase-apply/patch:24: trailing whitespace. .git/rebase-apply/patch:128: trailing whitespace. .git/rebase-apply/patch:226: trailing whitespace. .git/rebase-apply/patch:232: trailing whitespace. warning: 4 lines add whitespace errors. Regards, Vignesh
On Tue, Oct 03, 2023 at 03:17:57PM +0530, vignesh C wrote: > Few comments: Pierre, do you plan to submit a new revision of this patch set for the November commitfest? If not, the commitfest entry may be marked as returned-with-feedback soon. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, 10 Nov 2023 at 23:20, Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Oct 03, 2023 at 03:17:57PM +0530, vignesh C wrote: > > Few comments: > > Pierre, do you plan to submit a new revision of this patch set for the > November commitfest? If not, the commitfest entry may be marked as > returned-with-feedback soon. I have changed the status of commitfest entry to "Returned with Feedback" as the comments have not yet been resolved. Please handle the comments and update the commitfest entry accordingly. Regards, Vignesh