Обсуждение: pg_upgrade: transfer pg_largeobject_metadata's files when possible
(new thread) On Fri, Jul 18, 2025 at 11:05:04AM -0500, Nathan Bossart wrote: > I'm cautiously optimistic that we can find some better gains for upgrades > from v16 and newer. That would involve dumping lo_create() commands for > all LOs with comments/seclabels, dumping the relevant pg_shdepend rows, and > then copying/linking the pg_largeobject_metadata files like we did prior to > v12. Here is a patch. For background, the reason this is limited to upgrades from v16 and newer is because the aclitem data type (needed by pg_largeobject_metadata.lomacl) changed its storage format in v16 (see commit 7b378237aa). Note that the patch is essentially a revert of commit 12a53c732c, but there are enough differences that it should be considered a fresh effort. Something I hadn't anticipated is that we need to take special care to transfer the relfilenode of pg_largeobject_metadata and its index, as was done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the majority of the patch is dedicated to that. My testing showed some decent, but not earth-shattering performance improvements from this patch. For upgrades with many large objects with NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL lomacl/lomowner, that dropped to 25%. When each large object had a comment, there was no change. I'm assuming that its rare to have lots of large objects with comments or security labels, so I don't see any need to expend energy trying to optimize that use-case. I am a bit concerned that we'll forget to add checks for new types of dependencies similar to comments and security labels. If we do, pg_upgrade should just fail to restore the schema, and fixing the code should be easy enough. Also, we'll need to remember to revisit this code if there's another storage format change for one of pg_largeobject_metadata's columns, but that seems unlikely to happen anytime soon. On the whole, I'm not too worried about either of these points. -- nathan
Вложения
Have you considered re-creating pg_shdepend from pg_largeobject_metadata directly instead of having special cases for dumping it ? It would also be useful in cases of old (pg_upgraded since before pg 12) databases which might be missing it anyway. On Thu, Aug 14, 2025 at 5:22 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > (new thread) > > On Fri, Jul 18, 2025 at 11:05:04AM -0500, Nathan Bossart wrote: > > I'm cautiously optimistic that we can find some better gains for upgrades > > from v16 and newer. That would involve dumping lo_create() commands for > > all LOs with comments/seclabels, dumping the relevant pg_shdepend rows, and > > then copying/linking the pg_largeobject_metadata files like we did prior to > > v12. > > Here is a patch. For background, the reason this is limited to upgrades > from v16 and newer is because the aclitem data type (needed by > pg_largeobject_metadata.lomacl) changed its storage format in v16 (see > commit 7b378237aa). Note that the patch is essentially a revert of commit > 12a53c732c, but there are enough differences that it should be considered a > fresh effort. > > Something I hadn't anticipated is that we need to take special care to > transfer the relfilenode of pg_largeobject_metadata and its index, as was > done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the > majority of the patch is dedicated to that. > > My testing showed some decent, but not earth-shattering performance > improvements from this patch. For upgrades with many large objects with > NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL > lomacl/lomowner, that dropped to 25%. When each large object had a > comment, there was no change. I'm assuming that its rare to have lots of > large objects with comments or security labels, so I don't see any need to > expend energy trying to optimize that use-case. > > I am a bit concerned that we'll forget to add checks for new types of > dependencies similar to comments and security labels. If we do, pg_upgrade > should just fail to restore the schema, and fixing the code should be easy > enough. Also, we'll need to remember to revisit this code if there's > another storage format change for one of pg_largeobject_metadata's columns, > but that seems unlikely to happen anytime soon. On the whole, I'm not too > worried about either of these points. > > -- > nathan
On Tue, Aug 19, 2025 at 09:49:26AM +0200, Hannu Krosing wrote: > Have you considered re-creating pg_shdepend from > pg_largeobject_metadata directly instead of having special cases for > dumping it ? I considered it when you last brought up the idea [0], but my testing indicated that it's measurably slower. > It would also be useful in cases of old (pg_upgraded since before pg > 12) databases which might be missing it anyway. We only use COPY for upgrades from v12 and newer, and the patch at hand only applies to v16 and newer. There should be no need to repair pg_shdepend for any such upgrades because we haven't copied/linked the files since before v12. [0] https://postgr.es/m/CAMT0RQTXiqH7zdQEVSVd2L7_Cw4wQ1eHOD8hfZ%2B0vecMXJWc-w%40mail.gmail.com -- nathan
On Thu, Aug 14, 2025 at 10:22:02AM -0500, Nathan Bossart wrote: > Here is a patch. For background, the reason this is limited to upgrades > from v16 and newer is because the aclitem data type (needed by > pg_largeobject_metadata.lomacl) changed its storage format in v16 (see > commit 7b378237aa). Note that the patch is essentially a revert of commit > 12a53c732c, but there are enough differences that it should be considered a > fresh effort. Noted. > Something I hadn't anticipated is that we need to take special care to > transfer the relfilenode of pg_largeobject_metadata and its index, as was > done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the > majority of the patch is dedicated to that. > > My testing showed some decent, but not earth-shattering performance > improvements from this patch. For upgrades with many large objects with > NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL > lomacl/lomowner, that dropped to 25%. When each large object had a > comment, there was no change. I'm assuming that its rare to have lots of > large objects with comments or security labels, so I don't see any need to > expend energy trying to optimize that use-case. I highly doubt that there are a lot of comments assigned to LOs, so these numbers are pretty cool IMO. Security labels are a pain to test in the upgrade path, or test_dummy_label could be extended with a new TAP test and a pg_upgrade command.. There is some coverage with comments on LOs in src/bin/pg_dump's 002, so that would be enough for the comment part, at least. > I am a bit concerned that we'll forget to add checks for new types of > dependencies similar to comments and security labels. If we do, pg_upgrade > should just fail to restore the schema, and fixing the code should be easy > enough. Also, we'll need to remember to revisit this code if there's > another storage format change for one of pg_largeobject_metadata's columns, > but that seems unlikely to happen anytime soon. On the whole, I'm not too > worried about either of these points. This part does not worry me much, TBH. This stuff would require dump/restore support and the pg_dump test suite would catch that for commands with --binary-upgrade. So it should be hard to miss. - /* - * pg_largeobject - */ if (fout->remoteVersion >= 90300) appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n" "FROM pg_catalog.pg_class\n" - "WHERE oid IN (%u, %u);\n", - LargeObjectRelationId, LargeObjectLOidPNIndexId); + "WHERE oid IN (%u, %u, %u, %u);\n", + LargeObjectRelationId, LargeObjectLOidPNIndexId, + LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId); [...] appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n"); + appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n"); appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n"); + appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n"); Is all that really required when upgrading from a cluster in the 9.3~15 range? -- Michael
Вложения
On Mon, Sep 01, 2025 at 03:19:46PM +0900, Michael Paquier wrote: > I highly doubt that there are a lot of comments assigned to LOs, so > these numbers are pretty cool IMO. Security labels are a pain to test > in the upgrade path, or test_dummy_label could be extended with a new > TAP test and a pg_upgrade command.. There is some coverage with > comments on LOs in src/bin/pg_dump's 002, so that would be enough for > the comment part, at least. Do you think a new pg_upgrade test for security labels is worth the trouble? It seems doable, but it'd be an awfully expensive test for this. On the other hand, I'm not sure there's any coverage for pg_upgrade with security labels, so perhaps this is a good time to establish some tests. > - /* > - * pg_largeobject > - */ > if (fout->remoteVersion >= 90300) > appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n" > "FROM pg_catalog.pg_class\n" > - "WHERE oid IN (%u, %u);\n", > - LargeObjectRelationId, LargeObjectLOidPNIndexId); > + "WHERE oid IN (%u, %u, %u, %u);\n", > + LargeObjectRelationId, LargeObjectLOidPNIndexId, > + LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId); > [...] > appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n"); > + appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n"); > appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n"); > + appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n"); > > Is all that really required when upgrading from a cluster in the > 9.3~15 range? No, that stuff is discarded for upgrades from those versions. My intent was to maintain readability by avoiding lots of version checks. FWIW I originally put all of the pg_large_object_metadata stuff in a separate block, but that resulted in a lot of copy/pasted code. I'm happy to adjust it as you see fit. -- nathan
On Tue, Sep 02, 2025 at 09:43:40AM -0500, Nathan Bossart wrote: > Do you think a new pg_upgrade test for security labels is worth the > trouble? It seems doable, but it'd be an awfully expensive test for this. > On the other hand, I'm not sure there's any coverage for pg_upgrade with > security labels, so perhaps this is a good time to establish some tests. I would argue in favor of these additions. Security labels are not the most popular thing ever, AFAIK, but your patch makes the need more relevant to have. The cheapest approach would be to add a LO creation pattern in src/bin/pg_dump/t/002_pg_dump.pl, with an EXTRA_INSTALL pointing at src/test/modules/dummy_seclabel/ to be able to create the security label (we already do that in pg_upgrade and pg_basebackup so the trick works). That should be enough to make sure that the binary upgrade dumps have the seclabel data included. It's a bit funky, I agree. So if you think that this is not worth the test cycles, I won't push hard on this point, either. > No, that stuff is discarded for upgrades from those versions. My intent > was to maintain readability by avoiding lots of version checks. FWIW I > originally put all of the pg_large_object_metadata stuff in a separate > block, but that resulted in a lot of copy/pasted code. I'm happy to adjust > it as you see fit. + if (fout->remoteVersion >= 160000) + ArchiveEntry(fout, nilCatalogId, createDumpId(), + ARCHIVE_OPTS(.tag = "pg_largeobject_metadata", + .description = "pg_largeobject_metadata", + .section = SECTION_PRE_DATA, + .createStmt = lomOutQry->data)); So it's filtered out depending on the old cluster version here. I have managed to miss this part. Some tests with LOs and across multiple backend versions (pg_dump -C --binary-upgrade) are showing me that I am wrong and that you are right, with diffs showing up properly in the binary upgrade dumps for pg_largeobject_metadata. This logic looks OK to me, even if it's a waste to fill lomOutQry when upgrading from an instance in the 9.3~15 range, discarding ArchiveEntry() at the end. It's not a big deal. -- Michael
Вложения
On Thu, Sep 04, 2025 at 01:59:36PM +0900, Michael Paquier wrote: > On Tue, Sep 02, 2025 at 09:43:40AM -0500, Nathan Bossart wrote: >> Do you think a new pg_upgrade test for security labels is worth the >> trouble? It seems doable, but it'd be an awfully expensive test for this. >> On the other hand, I'm not sure there's any coverage for pg_upgrade with >> security labels, so perhaps this is a good time to establish some tests. > > I would argue in favor of these additions. Security labels are not > the most popular thing ever, AFAIK, but your patch makes the need more > relevant to have. The cheapest approach would be to add a LO creation > pattern in src/bin/pg_dump/t/002_pg_dump.pl, with an EXTRA_INSTALL > pointing at src/test/modules/dummy_seclabel/ to be able to create the > security label (we already do that in pg_upgrade and pg_basebackup so > the trick works). That should be enough to make sure that the binary > upgrade dumps have the seclabel data included. It's a bit funky, I > agree. So if you think that this is not worth the test cycles, I > won't push hard on this point, either. Ah, I'd forgotten about EXTRA_INSTALL. That simplifies things. There's enough special handling for large objects in pg_upgrade that I think we ought to test it end-to-end, so I sneaked it into 006_tranfer_modes.pl. WDYT? -- nathan
Вложения
On Thu, Sep 04, 2025 at 08:23:58PM -0500, Nathan Bossart wrote: > Ah, I'd forgotten about EXTRA_INSTALL. That simplifies things. There's > enough special handling for large objects in pg_upgrade that I think we > ought to test it end-to-end, so I sneaked it into 006_tranfer_modes.pl. > WDYT? Neat. That works for me. + $old->safe_psql('postgres', q| + CREATE EXTENSION dummy_seclabel; Still I think that this bit is going to fail with installcheck, because src/test/modules/ is not installed by default :) You can make that conditional with check_extension(), like the tricks for injection_points in other TAP tests. The security label creation and test also need to be made conditional. The work is already done with the check on oldinstall, it just needs an extra tweak, so I would store the condition in a separate variable at the top of test_mode() in 006_transfer_modes.pl, or something equivalent. -- Michael
Вложения
On Fri, Sep 05, 2025 at 03:35:21PM +0900, Michael Paquier wrote: > + $old->safe_psql('postgres', q| > + CREATE EXTENSION dummy_seclabel; > > Still I think that this bit is going to fail with installcheck, > because src/test/modules/ is not installed by default :) > > You can make that conditional with check_extension(), like the tricks > for injection_points in other TAP tests. The security label creation > and test also need to be made conditional. The work is already done > with the check on oldinstall, it just needs an extra tweak, so I would > store the condition in a separate variable at the top of test_mode() > in 006_transfer_modes.pl, or something equivalent. How does this look? -- nathan
Вложения
On Fri, Sep 05, 2025 at 01:12:49PM -0500, Nathan Bossart wrote: > How does this look? + # We can only test security labels if both the old and new installations + # have dummy_seclabel. + my $test_seclabel = 1; + $old->start; + if (!$old->check_extension('dummy_seclabel')) + { + $test_seclabel = 0; + } + $old->stop; + $new->start; + if (!$new->check_extension('dummy_seclabel')) + { + $test_seclabel = 0; + } + $new->stop; Yep. This plan is safe to rely on. A tiny comment I may have is that the LO numbers are hardcoded and duplicated. I would have used a variable to store these numbers. Please feel free to ignore my picky-ism. -- Michael
Вложения
On Sat, Sep 06, 2025 at 10:12:11AM +0900, Michael Paquier wrote: > Yep. This plan is safe to rely on. Committed, thanks for reviewing! -- nathan
On Mon, Sep 08, 2025 at 02:20:33PM -0500, Nathan Bossart wrote: > Committed, thanks for reviewing! Thanks! -- Michael