Обсуждение: pg_upgrade: transfer pg_largeobject_metadata's files when possible
(new thread) On Fri, Jul 18, 2025 at 11:05:04AM -0500, Nathan Bossart wrote: > I'm cautiously optimistic that we can find some better gains for upgrades > from v16 and newer. That would involve dumping lo_create() commands for > all LOs with comments/seclabels, dumping the relevant pg_shdepend rows, and > then copying/linking the pg_largeobject_metadata files like we did prior to > v12. Here is a patch. For background, the reason this is limited to upgrades from v16 and newer is because the aclitem data type (needed by pg_largeobject_metadata.lomacl) changed its storage format in v16 (see commit 7b378237aa). Note that the patch is essentially a revert of commit 12a53c732c, but there are enough differences that it should be considered a fresh effort. Something I hadn't anticipated is that we need to take special care to transfer the relfilenode of pg_largeobject_metadata and its index, as was done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the majority of the patch is dedicated to that. My testing showed some decent, but not earth-shattering performance improvements from this patch. For upgrades with many large objects with NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL lomacl/lomowner, that dropped to 25%. When each large object had a comment, there was no change. I'm assuming that its rare to have lots of large objects with comments or security labels, so I don't see any need to expend energy trying to optimize that use-case. I am a bit concerned that we'll forget to add checks for new types of dependencies similar to comments and security labels. If we do, pg_upgrade should just fail to restore the schema, and fixing the code should be easy enough. Also, we'll need to remember to revisit this code if there's another storage format change for one of pg_largeobject_metadata's columns, but that seems unlikely to happen anytime soon. On the whole, I'm not too worried about either of these points. -- nathan
Вложения
Have you considered re-creating pg_shdepend from pg_largeobject_metadata directly instead of having special cases for dumping it ? It would also be useful in cases of old (pg_upgraded since before pg 12) databases which might be missing it anyway. On Thu, Aug 14, 2025 at 5:22 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > (new thread) > > On Fri, Jul 18, 2025 at 11:05:04AM -0500, Nathan Bossart wrote: > > I'm cautiously optimistic that we can find some better gains for upgrades > > from v16 and newer. That would involve dumping lo_create() commands for > > all LOs with comments/seclabels, dumping the relevant pg_shdepend rows, and > > then copying/linking the pg_largeobject_metadata files like we did prior to > > v12. > > Here is a patch. For background, the reason this is limited to upgrades > from v16 and newer is because the aclitem data type (needed by > pg_largeobject_metadata.lomacl) changed its storage format in v16 (see > commit 7b378237aa). Note that the patch is essentially a revert of commit > 12a53c732c, but there are enough differences that it should be considered a > fresh effort. > > Something I hadn't anticipated is that we need to take special care to > transfer the relfilenode of pg_largeobject_metadata and its index, as was > done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the > majority of the patch is dedicated to that. > > My testing showed some decent, but not earth-shattering performance > improvements from this patch. For upgrades with many large objects with > NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL > lomacl/lomowner, that dropped to 25%. When each large object had a > comment, there was no change. I'm assuming that its rare to have lots of > large objects with comments or security labels, so I don't see any need to > expend energy trying to optimize that use-case. > > I am a bit concerned that we'll forget to add checks for new types of > dependencies similar to comments and security labels. If we do, pg_upgrade > should just fail to restore the schema, and fixing the code should be easy > enough. Also, we'll need to remember to revisit this code if there's > another storage format change for one of pg_largeobject_metadata's columns, > but that seems unlikely to happen anytime soon. On the whole, I'm not too > worried about either of these points. > > -- > nathan
On Tue, Aug 19, 2025 at 09:49:26AM +0200, Hannu Krosing wrote: > Have you considered re-creating pg_shdepend from > pg_largeobject_metadata directly instead of having special cases for > dumping it ? I considered it when you last brought up the idea [0], but my testing indicated that it's measurably slower. > It would also be useful in cases of old (pg_upgraded since before pg > 12) databases which might be missing it anyway. We only use COPY for upgrades from v12 and newer, and the patch at hand only applies to v16 and newer. There should be no need to repair pg_shdepend for any such upgrades because we haven't copied/linked the files since before v12. [0] https://postgr.es/m/CAMT0RQTXiqH7zdQEVSVd2L7_Cw4wQ1eHOD8hfZ%2B0vecMXJWc-w%40mail.gmail.com -- nathan
On Thu, Aug 14, 2025 at 10:22:02AM -0500, Nathan Bossart wrote:
> Here is a patch. For background, the reason this is limited to upgrades
> from v16 and newer is because the aclitem data type (needed by
> pg_largeobject_metadata.lomacl) changed its storage format in v16 (see
> commit 7b378237aa). Note that the patch is essentially a revert of commit
> 12a53c732c, but there are enough differences that it should be considered a
> fresh effort.
Noted.
> Something I hadn't anticipated is that we need to take special care to
> transfer the relfilenode of pg_largeobject_metadata and its index, as was
> done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the
> majority of the patch is dedicated to that.
>
> My testing showed some decent, but not earth-shattering performance
> improvements from this patch. For upgrades with many large objects with
> NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL
> lomacl/lomowner, that dropped to 25%. When each large object had a
> comment, there was no change. I'm assuming that its rare to have lots of
> large objects with comments or security labels, so I don't see any need to
> expend energy trying to optimize that use-case.
I highly doubt that there are a lot of comments assigned to LOs, so
these numbers are pretty cool IMO. Security labels are a pain to test
in the upgrade path, or test_dummy_label could be extended with a new
TAP test and a pg_upgrade command.. There is some coverage with
comments on LOs in src/bin/pg_dump's 002, so that would be enough for
the comment part, at least.
> I am a bit concerned that we'll forget to add checks for new types of
> dependencies similar to comments and security labels. If we do, pg_upgrade
> should just fail to restore the schema, and fixing the code should be easy
> enough. Also, we'll need to remember to revisit this code if there's
> another storage format change for one of pg_largeobject_metadata's columns,
> but that seems unlikely to happen anytime soon. On the whole, I'm not too
> worried about either of these points.
This part does not worry me much, TBH. This stuff would require
dump/restore support and the pg_dump test suite would catch that for
commands with --binary-upgrade. So it should be hard to miss.
- /*
- * pg_largeobject
- */
if (fout->remoteVersion >= 90300)
appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n"
"FROM pg_catalog.pg_class\n"
- "WHERE oid IN (%u, %u);\n",
- LargeObjectRelationId, LargeObjectLOidPNIndexId);
+ "WHERE oid IN (%u, %u, %u, %u);\n",
+ LargeObjectRelationId, LargeObjectLOidPNIndexId,
+ LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId);
[...]
appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and
relminmxid\n");
+ appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and
relminmxid\n");
appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n");
+ appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index
relfilenodes\n");
Is all that really required when upgrading from a cluster in the
9.3~15 range?
--
Michael
Вложения
On Mon, Sep 01, 2025 at 03:19:46PM +0900, Michael Paquier wrote: > I highly doubt that there are a lot of comments assigned to LOs, so > these numbers are pretty cool IMO. Security labels are a pain to test > in the upgrade path, or test_dummy_label could be extended with a new > TAP test and a pg_upgrade command.. There is some coverage with > comments on LOs in src/bin/pg_dump's 002, so that would be enough for > the comment part, at least. Do you think a new pg_upgrade test for security labels is worth the trouble? It seems doable, but it'd be an awfully expensive test for this. On the other hand, I'm not sure there's any coverage for pg_upgrade with security labels, so perhaps this is a good time to establish some tests. > - /* > - * pg_largeobject > - */ > if (fout->remoteVersion >= 90300) > appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n" > "FROM pg_catalog.pg_class\n" > - "WHERE oid IN (%u, %u);\n", > - LargeObjectRelationId, LargeObjectLOidPNIndexId); > + "WHERE oid IN (%u, %u, %u, %u);\n", > + LargeObjectRelationId, LargeObjectLOidPNIndexId, > + LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId); > [...] > appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n"); > + appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n"); > appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n"); > + appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n"); > > Is all that really required when upgrading from a cluster in the > 9.3~15 range? No, that stuff is discarded for upgrades from those versions. My intent was to maintain readability by avoiding lots of version checks. FWIW I originally put all of the pg_large_object_metadata stuff in a separate block, but that resulted in a lot of copy/pasted code. I'm happy to adjust it as you see fit. -- nathan
On Tue, Sep 02, 2025 at 09:43:40AM -0500, Nathan Bossart wrote: > Do you think a new pg_upgrade test for security labels is worth the > trouble? It seems doable, but it'd be an awfully expensive test for this. > On the other hand, I'm not sure there's any coverage for pg_upgrade with > security labels, so perhaps this is a good time to establish some tests. I would argue in favor of these additions. Security labels are not the most popular thing ever, AFAIK, but your patch makes the need more relevant to have. The cheapest approach would be to add a LO creation pattern in src/bin/pg_dump/t/002_pg_dump.pl, with an EXTRA_INSTALL pointing at src/test/modules/dummy_seclabel/ to be able to create the security label (we already do that in pg_upgrade and pg_basebackup so the trick works). That should be enough to make sure that the binary upgrade dumps have the seclabel data included. It's a bit funky, I agree. So if you think that this is not worth the test cycles, I won't push hard on this point, either. > No, that stuff is discarded for upgrades from those versions. My intent > was to maintain readability by avoiding lots of version checks. FWIW I > originally put all of the pg_large_object_metadata stuff in a separate > block, but that resulted in a lot of copy/pasted code. I'm happy to adjust > it as you see fit. + if (fout->remoteVersion >= 160000) + ArchiveEntry(fout, nilCatalogId, createDumpId(), + ARCHIVE_OPTS(.tag = "pg_largeobject_metadata", + .description = "pg_largeobject_metadata", + .section = SECTION_PRE_DATA, + .createStmt = lomOutQry->data)); So it's filtered out depending on the old cluster version here. I have managed to miss this part. Some tests with LOs and across multiple backend versions (pg_dump -C --binary-upgrade) are showing me that I am wrong and that you are right, with diffs showing up properly in the binary upgrade dumps for pg_largeobject_metadata. This logic looks OK to me, even if it's a waste to fill lomOutQry when upgrading from an instance in the 9.3~15 range, discarding ArchiveEntry() at the end. It's not a big deal. -- Michael
Вложения
On Thu, Sep 04, 2025 at 01:59:36PM +0900, Michael Paquier wrote: > On Tue, Sep 02, 2025 at 09:43:40AM -0500, Nathan Bossart wrote: >> Do you think a new pg_upgrade test for security labels is worth the >> trouble? It seems doable, but it'd be an awfully expensive test for this. >> On the other hand, I'm not sure there's any coverage for pg_upgrade with >> security labels, so perhaps this is a good time to establish some tests. > > I would argue in favor of these additions. Security labels are not > the most popular thing ever, AFAIK, but your patch makes the need more > relevant to have. The cheapest approach would be to add a LO creation > pattern in src/bin/pg_dump/t/002_pg_dump.pl, with an EXTRA_INSTALL > pointing at src/test/modules/dummy_seclabel/ to be able to create the > security label (we already do that in pg_upgrade and pg_basebackup so > the trick works). That should be enough to make sure that the binary > upgrade dumps have the seclabel data included. It's a bit funky, I > agree. So if you think that this is not worth the test cycles, I > won't push hard on this point, either. Ah, I'd forgotten about EXTRA_INSTALL. That simplifies things. There's enough special handling for large objects in pg_upgrade that I think we ought to test it end-to-end, so I sneaked it into 006_tranfer_modes.pl. WDYT? -- nathan
Вложения
On Thu, Sep 04, 2025 at 08:23:58PM -0500, Nathan Bossart wrote:
> Ah, I'd forgotten about EXTRA_INSTALL. That simplifies things. There's
> enough special handling for large objects in pg_upgrade that I think we
> ought to test it end-to-end, so I sneaked it into 006_tranfer_modes.pl.
> WDYT?
Neat. That works for me.
+ $old->safe_psql('postgres', q|
+ CREATE EXTENSION dummy_seclabel;
Still I think that this bit is going to fail with installcheck,
because src/test/modules/ is not installed by default :)
You can make that conditional with check_extension(), like the tricks
for injection_points in other TAP tests. The security label creation
and test also need to be made conditional. The work is already done
with the check on oldinstall, it just needs an extra tweak, so I would
store the condition in a separate variable at the top of test_mode()
in 006_transfer_modes.pl, or something equivalent.
--
Michael
Вложения
On Fri, Sep 05, 2025 at 03:35:21PM +0900, Michael Paquier wrote:
> + $old->safe_psql('postgres', q|
> + CREATE EXTENSION dummy_seclabel;
>
> Still I think that this bit is going to fail with installcheck,
> because src/test/modules/ is not installed by default :)
>
> You can make that conditional with check_extension(), like the tricks
> for injection_points in other TAP tests. The security label creation
> and test also need to be made conditional. The work is already done
> with the check on oldinstall, it just needs an extra tweak, so I would
> store the condition in a separate variable at the top of test_mode()
> in 006_transfer_modes.pl, or something equivalent.
How does this look?
--
nathan
Вложения
On Fri, Sep 05, 2025 at 01:12:49PM -0500, Nathan Bossart wrote:
> How does this look?
+ # We can only test security labels if both the old and new installations
+ # have dummy_seclabel.
+ my $test_seclabel = 1;
+ $old->start;
+ if (!$old->check_extension('dummy_seclabel'))
+ {
+ $test_seclabel = 0;
+ }
+ $old->stop;
+ $new->start;
+ if (!$new->check_extension('dummy_seclabel'))
+ {
+ $test_seclabel = 0;
+ }
+ $new->stop;
Yep. This plan is safe to rely on.
A tiny comment I may have is that the LO numbers are hardcoded and
duplicated. I would have used a variable to store these numbers.
Please feel free to ignore my picky-ism.
--
Michael
Вложения
On Sat, Sep 06, 2025 at 10:12:11AM +0900, Michael Paquier wrote: > Yep. This plan is safe to rely on. Committed, thanks for reviewing! -- nathan
On Mon, Sep 08, 2025 at 02:20:33PM -0500, Nathan Bossart wrote: > Committed, thanks for reviewing! Thanks! -- Michael
Вложения
Hi,
On 2025-09-08 14:20:33 -0500, Nathan Bossart wrote:
> On Sat, Sep 06, 2025 at 10:12:11AM +0900, Michael Paquier wrote:
> > Yep. This plan is safe to rely on.
>
> Committed, thanks for reviewing!
I just had occasion to look at this code and I found the way this works quite
confusing:
When upgrading from a new enough server, we migrate
pg_largeobject_metadata. So far so good. But we *also* do a COPY FROM
COPY "pg_catalog"."pg_largeobject_metadata" ("oid", "lomowner", "lomacl") FROM stdin;
...
for all the large objects that have a description or a security label.
For a while I was somewhat baffled, because that sure looks like it ought to
lead to uniqueness violations. But it doesn't.
The reason, I think, is that the COPY is happening into a relfilenode that
will be overwritten later, it doesn't yet contain the contents of the old
cluster.
Presumably we do this because we need the temporary pg_largeobject_metadata to
make COMMENT ON and security label commands not fail.
If this is the reasoning / how it works, shouldn't there be a comment in the
code or the commit message explaining that? Because it sure seems non-obvious
to me.
It's also not entirely obvious to me that this is safe - after all
(bbe08b8869bd, revised in 0e758ae89) appeared to have taken some pains to
ensure that the file gets unlinked immediately during the "binary upgrade
mode" TRUNCATE. But now we are actually filling that file again, after the
relation had been truncated?
Separately, if we don't handle large objects that don't have comments or
labels via pg_dump, why do we do dependency tracking for all LOs in getLOs(),
rather than just the ones that have a comment / label? Given how much memory
both the query results and the dependency tracking take...
Greetings,
Andres Freund
Hi,
On 2026-02-03 18:16:17 -0500, Andres Freund wrote:
> On 2025-09-08 14:20:33 -0500, Nathan Bossart wrote:
> > On Sat, Sep 06, 2025 at 10:12:11AM +0900, Michael Paquier wrote:
> > > Yep. This plan is safe to rely on.
> >
> > Committed, thanks for reviewing!
>
> I just had occasion to look at this code and I found the way this works quite
> confusing:
>
> When upgrading from a new enough server, we migrate
> pg_largeobject_metadata. So far so good. But we *also* do a COPY FROM
> COPY "pg_catalog"."pg_largeobject_metadata" ("oid", "lomowner", "lomacl") FROM stdin;
> ...
>
> for all the large objects that have a description or a security label.
>
>
> For a while I was somewhat baffled, because that sure looks like it ought to
> lead to uniqueness violations. But it doesn't.
>
> The reason, I think, is that the COPY is happening into a relfilenode that
> will be overwritten later, it doesn't yet contain the contents of the old
> cluster.
>
> Presumably we do this because we need the temporary pg_largeobject_metadata to
> make COMMENT ON and security label commands not fail.
>
>
> If this is the reasoning / how it works, shouldn't there be a comment in the
> code or the commit message explaining that? Because it sure seems non-obvious
> to me.
>
> It's also not entirely obvious to me that this is safe - after all
> (bbe08b8869bd, revised in 0e758ae89) appeared to have taken some pains to
> ensure that the file gets unlinked immediately during the "binary upgrade
> mode" TRUNCATE. But now we are actually filling that file again, after the
> relation had been truncated?
An example of what could go wrong:
Imagine that we add support for freezing pages on-access (as we are
discussing). If pg_largeobject_metadata was *not* frozen / vacuumed on the old
server, there might not be a _vm on the old server, but due to the new
on-access freezing, there might be one created for the data in the ephemeral
pg_largeobject_metadata during the accesses from COMMENT ON.
Because there's no VM on the old server, transfer_relfile() would just return
false:
/* Is it an extent, fsm, or vm file? */
if (type_suffix[0] != '\0' || segno != 0)
{
/* Did file open fail? */
if (stat(old_file, &statbuf) != 0)
{
/* File does not exist? That's OK, just return */
if (errno == ENOENT)
return;
therefore not reaching the:
unlink(new_file);
Leaving a completely wrong visibilitymap in place.
There are other scary possibilities. Imagine that the ephemeral
pg_largeobject_metadata ends up bigger than on the old server, e.g. because we
are a bit more aggressive about bulk relation extension in the new version
(which we should be!). If that size difference ends up with a different
segment count between the old server and the ephemeral relation in the new
server, we're in trouble: transfer_relfile() wouldn't unlink the additional
segments on the target system.
Greetings,
Andres Freund
On Tue, Feb 03, 2026 at 06:46:25PM -0500, Andres Freund wrote:
>> The reason, I think, is that the COPY is happening into a relfilenode that
>> will be overwritten later, it doesn't yet contain the contents of the old
>> cluster.
>>
>> Presumably we do this because we need the temporary pg_largeobject_metadata to
>> make COMMENT ON and security label commands not fail.
>>
>> If this is the reasoning / how it works, shouldn't there be a comment in the
>> code or the commit message explaining that? Because it sure seems non-obvious
>> to me.
Right, the COPY for LOs with comments and security labels is solely meant
to avoid failure when restoring the comments and security labels, since we
won't have transferred the relation files yet. This was the case before
commit 12a53c732c, where we had this comment in getBlobs():
* We *do* dump out the definition of the blob because we need that to
* make the restoration of the comments, and anything else, work since
* pg_upgrade copies the files behind pg_largeobject and
* pg_largeobject_metadata after the dump is restored.
Commit 3bcfcd815e (mine) added this one to pg_dump.c:
* If upgrading from v16 or newer, only dump large objects with
* comments/seclabels. For these upgrades, pg_upgrade can copy/link
* pg_largeobject_metadata's files (which is usually faster) but we
* still need to dump LOs with comments/seclabels here so that the
* subsequent COMMENT and SECURITY LABEL commands work. pg_upgrade
* can't copy/link the files from older versions because aclitem
* (needed by pg_largeobject_metadata.lomacl) changed its storage
* format in v16.
IIUC your critique is that this doesn't explain the overwriting behavior
like the older comment does. I'll work on adding that.
>> It's also not entirely obvious to me that this is safe - after all
>> (bbe08b8869bd, revised in 0e758ae89) appeared to have taken some pains to
>> ensure that the file gets unlinked immediately during the "binary upgrade
>> mode" TRUNCATE. But now we are actually filling that file again, after the
>> relation had been truncated?
>
> An example of what could go wrong:
>
> [... examples of what could go wrong ...]
I'm considering a couple of options here, but it seems like the easiest
thing to do is to move the TRUNCATE commands to the end of the dump file.
At least, that seems to be sufficient for our existing tests. If that
seems okay to you, I can work on putting together a patch.
--
nathan
On Wed, Feb 04, 2026 at 10:06:29AM -0600, Nathan Bossart wrote: > IIUC your critique is that this doesn't explain the overwriting behavior > like the older comment does. I'll work on adding that. > > [...] > > I'm considering a couple of options here, but it seems like the easiest > thing to do is to move the TRUNCATE commands to the end of the dump file. > At least, that seems to be sufficient for our existing tests. If that > seems okay to you, I can work on putting together a patch. Here is a rough first draft of a patch that does this. -- nathan
Вложения
Hi, On 2026-02-04 10:06:29 -0600, Nathan Bossart wrote: > IIUC your critique is that this doesn't explain the overwriting behavior > like the older comment does. I'll work on adding that. I think even the old comment was woefully under-documenting that the commands are all just make work that's going to be thrown out almost immediately after. Thanks for addressing that! On 2026-02-04 14:08:47 -0600, Nathan Bossart wrote: > On Wed, Feb 04, 2026 at 10:06:29AM -0600, Nathan Bossart wrote: > > IIUC your critique is that this doesn't explain the overwriting behavior > > like the older comment does. I'll work on adding that. > > > > [...] > > > > I'm considering a couple of options here, but it seems like the easiest > > thing to do is to move the TRUNCATE commands to the end of the dump file. > > At least, that seems to be sufficient for our existing tests. If that > > seems okay to you, I can work on putting together a patch. > > Here is a rough first draft of a patch that does this. It certainly seems better than what we do now. Still feels pretty grotty and error prone to me that we fill the catalog table and then throw the contents out. > @@ -1157,7 +1158,10 @@ main(int argc, char **argv) > * subsequent COMMENT and SECURITY LABEL commands work. pg_upgrade > * can't copy/link the files from older versions because aclitem > * (needed by pg_largeobject_metadata.lomacl) changed its storage > - * format in v16. > + * format in v16. At the end of the dump, we'll generate a TRUNCATE > + * command for pg_largeobject_metadata so that it's contents are > + * cleared in preparation for the subsequent file transfer by > + * pg_upgrade. > */ I'd move that comment to earlier in the paragraph, it sounds a bit like it applies to the v16 specific bits. > if (fout->remoteVersion >= 160000) > lo_metadata->dataObj->filtercond = "WHERE oid IN " > @@ -1243,6 +1247,13 @@ main(int argc, char **argv) > for (i = 0; i < numObjs; i++) > dumpDumpableObject(fout, dobjs[i]); > > + /* > + * For binary upgrades, set relfrozenxids, relminmxids, and relfilenodes > + * of pg_largeobject and maybe pg_largeobject_metadata, and remove all > + * their files. We will transfer them from the old cluster as needed. > + */ > + dumpLOTruncation(fout); > + > /* > * Set up options info to ensure we dump what we want. > */ Seems good to move this to a dedicated function, regardless of anything else. Do I see correctly that we just rely on the ordering in the file, rather than dependencies? That's not a complaint, I just don't know that code very well. Greetings, Andres Freund
On Thu, Feb 05, 2026 at 11:19:46AM -0500, Andres Freund wrote: > It certainly seems better than what we do now. Still feels pretty grotty and > error prone to me that we fill the catalog table and then throw the contents > out. Before I go any further with this approach, I thought of something else we could do that I believe is worth considering... As of commit 3bcfcd815e, the only reason we are dumping any of pg_largeobject_metadata at all is to avoid an ERROR during COMMENT ON or SECURITY LABEL ON because the call to LargeObjectExists() in get_object_address() returns false. If we bypass that check in binary-upgrade mode, we can skip dumping pg_largeobject_metadata entirely. The attached patch passes our existing tests, and it seems to create the expected binary-upgrade-mode dump files, too. I haven't updated any of the comments yet. -- nathan
Вложения
On Thu, Feb 05, 2026 at 11:36:00AM -0600, Nathan Bossart wrote:
> @@ -1046,7 +1046,7 @@ get_object_address(ObjectType objtype, Node *object,
> address.classId = LargeObjectRelationId;
> address.objectId = oidparse(object);
> address.objectSubId = 0;
> - if (!LargeObjectExists(address.objectId))
> + if (!LargeObjectExists(address.objectId) && !IsBinaryUpgrade)
> {
> if (!missing_ok)
> ereport(ERROR,
(Probably better to set missing_ok for only the specific commands we want
to bypass this check, but you get the idea...)
--
nathan
Hi, On 2026-02-05 11:36:00 -0600, Nathan Bossart wrote: > On Thu, Feb 05, 2026 at 11:19:46AM -0500, Andres Freund wrote: > > It certainly seems better than what we do now. Still feels pretty grotty and > > error prone to me that we fill the catalog table and then throw the contents > > out. > > Before I go any further with this approach, I thought of something else we > could do that I believe is worth considering... > > As of commit 3bcfcd815e, the only reason we are dumping any of > pg_largeobject_metadata at all is to avoid an ERROR during COMMENT ON or > SECURITY LABEL ON because the call to LargeObjectExists() in > get_object_address() returns false. If we bypass that check in > binary-upgrade mode, we can skip dumping pg_largeobject_metadata entirely. Yea, I think that's worth considering. As you say downthread, the check for binary upgrade should probably be moved, but that's details. Upthread I also wondering why we do all the work in getLOs() if we don't actually need most of it (only if there are comments or labels). Right now that's a very slow and very memory intensive part of doing an upgrade of a system with a lot of binary upgrades. Do we need *any* of that if we go the path you suggest? Greetings, Andres Freund
On Thu, Feb 05, 2026 at 01:02:17PM -0500, Andres Freund wrote: > Upthread I also wondering why we do all the work in getLOs() if we don't > actually need most of it (only if there are comments or labels). Right now > that's a very slow and very memory intensive part of doing an upgrade of a > system with a lot of binary upgrades. Do we need *any* of that if we go the > path you suggest? AFAICT we only need it for the comments and security labels later on. Commit a45c78e3 did batch 1000 large objects into each ArchiveEntry, but of course there can still be a ton of entries. In theory, we could update the pg_largeobject_metadata query to only retrieve LOs with comments and security labels. I'm not sure it's worth trying to optimize further than that; we've long operated under the assumption that comments/seclabels on LOs are pretty rare. -- nathan
Hi,
On 2026-02-05 13:31:23 -0600, Nathan Bossart wrote:
> On Thu, Feb 05, 2026 at 01:02:17PM -0500, Andres Freund wrote:
> > Upthread I also wondering why we do all the work in getLOs() if we don't
> > actually need most of it (only if there are comments or labels). Right now
> > that's a very slow and very memory intensive part of doing an upgrade of a
> > system with a lot of binary upgrades. Do we need *any* of that if we go the
> > path you suggest?
>
> AFAICT we only need it for the comments and security labels later on.
And in binary upgrade mode not even for that, if we do the thing we talked
about re not checking references in binary upgrade mode? Right?
> Commit a45c78e3 did batch 1000 large objects into each ArchiveEntry, but of
> course there can still be a ton of entries.
Entries and then also the PGresult (almost as large as all the entries). The
peak memory usage is pretty bad. We could address the PGresult memory usage
with PQsetChunkedRowsMode() or explicit cursor use, but it doesn't seem
entirely trivial to combine with the batching.
> In theory, we could update the pg_largeobject_metadata query to only
> retrieve LOs with comments and security labels. I'm not sure it's worth
> trying to optimize further than that; we've long operated under the
> assumption that comments/seclabels on LOs are pretty rare.
I think that'd be a huge improvement. Right now it's not hard to get into a
situation where you have too many LOs to not have enough memory to do a
pg_upgrade.
Memory usage aside, it's also slow and expensive from the query execution and
data transfer side. Because of the ORDER BY that the batching requires, the
server needs to sort all of pg_largeobject_metadata before any rows can be
returned.
For 5M LOs:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort (cost=715978.34..728478.39 rows=5000020 width=72) (actual time=10292.252..10652.950 rows=5000020.00 loops=1)
│
│ Sort Key: pg_largeobject_metadata.lomowner, ((pg_largeobject_metadata.lomacl)::text), pg_largeobject_metadata.oid
│
│ Sort Method: quicksort Memory: 509110kB
│
│ -> Seq Scan on pg_largeobject_metadata (cost=0.00..159638.55 rows=5000020 width=72) (actual time=0.034..2284.442
rows=5000020.00loops=1) │
│ SubPlan expr_1
│
│ -> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.000 rows=1.00 loops=5000020)
│
│ Planning Time: 0.117 ms
│
│ Serialization: time=3961.343 ms output=218686kB format=text
│
│ Execution Time: 14930.747 ms
│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)
/usr/bin/time -v pg_dump --binary-upgrade --no-data --quote-all-identifiers --no-statistics --format=custom -f
/tmp/dumplo_5m
Command being timed: "pg_dump --binary-upgrade --no-data --quote-all-identifiers --no-statistics --format=custom -f
/tmp/dumplo_5m"
User time (seconds): 1.85
System time (seconds): 0.54
Percent of CPU this job got: 16%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.55
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 935084
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 133379
Voluntary context switches: 30486
Involuntary context switches: 6
Swaps: 0
File system inputs: 0
File system outputs: 16
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Peak memory usage ~1GB.
And unfortunately 5M LOs is not a whole lot.
Filtering the LOs in the query to only return ones that have a comment / label
would typically make the query much faster and pg_dump consume a lot less
memory.
Greetings,
Andres Freund
On Thu, Feb 05, 2026 at 03:23:38PM -0500, Andres Freund wrote: > On 2026-02-05 13:31:23 -0600, Nathan Bossart wrote: >> On Thu, Feb 05, 2026 at 01:02:17PM -0500, Andres Freund wrote: >> > Upthread I also wondering why we do all the work in getLOs() if we don't >> > actually need most of it (only if there are comments or labels). Right now >> > that's a very slow and very memory intensive part of doing an upgrade of a >> > system with a lot of binary upgrades. Do we need *any* of that if we go the >> > path you suggest? >> >> AFAICT we only need it for the comments and security labels later on. > > And in binary upgrade mode not even for that, if we do the thing we talked > about re not checking references in binary upgrade mode? Right? Well, we need _something_ to do what dumpLO() does today, i.e., call dumpComment() and dumpSecLabel() for each large object that has one or the other. I'm sure we could decouple that from the getLOs() machinery if we tried hard enough, but I'm not seeing any great advantage in doing so if we teach getLOs() to only get what it needs. >> In theory, we could update the pg_largeobject_metadata query to only >> retrieve LOs with comments and security labels. I'm not sure it's worth >> trying to optimize further than that; we've long operated under the >> assumption that comments/seclabels on LOs are pretty rare. > > I think that'd be a huge improvement. Right now it's not hard to get into a > situation where you have too many LOs to not have enough memory to do a > pg_upgrade. It seems to work. I'm working on a nicer patch for all this stuff. -- nathan
Hi,
On 2026-02-05 15:23:38 -0500, Andres Freund wrote:
> Memory usage aside, it's also slow and expensive from the query execution and
> data transfer side. Because of the ORDER BY that the batching requires, the
> server needs to sort all of pg_largeobject_metadata before any rows can be
> returned.
>
> For 5M LOs:
>
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN
│
>
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Sort (cost=715978.34..728478.39 rows=5000020 width=72) (actual time=10292.252..10652.950 rows=5000020.00 loops=1)
│
> │ Sort Key: pg_largeobject_metadata.lomowner, ((pg_largeobject_metadata.lomacl)::text), pg_largeobject_metadata.oid
│
> │ Sort Method: quicksort Memory: 509110kB
│
> │ -> Seq Scan on pg_largeobject_metadata (cost=0.00..159638.55 rows=5000020 width=72) (actual
time=0.034..2284.442rows=5000020.00 loops=1) │
> │ SubPlan expr_1
│
> │ -> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.000 rows=1.00 loops=5000020)
│
> │ Planning Time: 0.117 ms
│
> │ Serialization: time=3961.343 ms output=218686kB format=text
│
> │ Execution Time: 14930.747 ms
│
>
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (9 rows)
This isn't quite the right query, sorry for that. I just tried moving the
acldefault() in a subselect, because I was wondering whether that'd prevent it
from being computed below the sort. Unfortunately no. It's a bit faster
without that, but not much:
EXPLAIN (ANALYZE, SERIALIZE, BUFFERS OFF, VERBOSE) SELECT oid, lomowner, lomacl::pg_catalog.text, acldefault('L',
lomowner)AS acldefault FROM pg_largeobject_metadata ORDER BY lomowner, lomacl::pg_catalog.text, oid;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort (cost=665978.14..678478.19 rows=5000020 width=72) (actual time=8887.007..9243.088 rows=5000020.00 loops=1)
│
│ Output: oid, lomowner, ((lomacl)::text), (acldefault('L'::"char", lomowner))
│
│ Sort Key: pg_largeobject_metadata.lomowner, ((pg_largeobject_metadata.lomacl)::text), pg_largeobject_metadata.oid
│
│ Sort Method: quicksort Memory: 509110kB
│
│ -> Seq Scan on pg_catalog.pg_largeobject_metadata (cost=0.00..109638.35 rows=5000020 width=72) (actual
time=0.029..823.895rows=5000020.00 loops=1) │
│ Output: oid, lomowner, (lomacl)::text, acldefault('L'::"char", lomowner)
│
│ Planning Time: 0.087 ms
│
│ Serialization: time=3965.649 ms output=218686kB format=text
│
│ Execution Time: 13516.925 ms
│
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)
I might start a separate thread about this misoptimization...
Greetings,
Andres Freund
On Thu, Feb 05, 2026 at 02:30:28PM -0600, Nathan Bossart wrote: > On Thu, Feb 05, 2026 at 03:23:38PM -0500, Andres Freund wrote: >> On 2026-02-05 13:31:23 -0600, Nathan Bossart wrote: >>> In theory, we could update the pg_largeobject_metadata query to only >>> retrieve LOs with comments and security labels. I'm not sure it's worth >>> trying to optimize further than that; we've long operated under the >>> assumption that comments/seclabels on LOs are pretty rare. >> >> I think that'd be a huge improvement. Right now it's not hard to get into a >> situation where you have too many LOs to not have enough memory to do a >> pg_upgrade. > > It seems to work. I'm working on a nicer patch for all this stuff. Here is what I have so far. -- nathan
Вложения
On Thu, Feb 05, 2026 at 03:29:55PM -0600, Nathan Bossart wrote: > Here is what I have so far. BTW I ran 006_transfer_modes.pl (which tests LOs with comments and security labels) for upgrades from v10, v13, v16, v17, and v18, and all tests succeeded. -- nathan
Parts of this discussion are very much related to https://commitfest.postgresql.org/patch/5774/ - "Extending skipping FK checks on replicas to include ADD FK and TRUNCATE" I have seen cases where adding a foreign key after copying data takes several days for no benefit at all. We should be allowing more of "skip unuseful work" options for all the cases we can be pretty sure the dat ais ok - upgrades, replica setups and restoring backup. We probably should also have an option to tolerate slight corruption, like duplicate PK entries if the source is an existing database. 99 times out of 100 the user would rather have a slight corruption than no database at all, for example in case they had to do an emergency restore and if fails after running 3 hours because ADD PRIMARY KEY hits a duplicate row. On Fri, Feb 6, 2026 at 8:40 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Thu, Feb 05, 2026 at 03:29:55PM -0600, Nathan Bossart wrote: > > Here is what I have so far. > > BTW I ran 006_transfer_modes.pl (which tests LOs with comments and security > labels) for upgrades from v10, v13, v16, v17, and v18, and all tests > succeeded. > > -- > nathan
Hi,
On 2026-02-05 15:29:55 -0600, Nathan Bossart wrote:
> Here is what I have so far.
Thanks!
> --- a/src/bin/pg_dump/pg_dump.c
> +++ b/src/bin/pg_dump/pg_dump.c
> @@ -214,12 +214,6 @@ static int nbinaryUpgradeClassOids = 0;
> static SequenceItem *sequences = NULL;
> static int nsequences = 0;
>
> -/*
> - * For binary upgrade, the dump ID of pg_largeobject_metadata is saved for use
> - * as a dependency for pg_shdepend and any large object comments/seclabels.
> - */
> -static DumpId lo_metadata_dumpId;
> -
> /* Maximum number of relations to fetch in a fetchAttributeStats() call. */
> #define MAX_ATTR_STATS_RELS 64
>
> @@ -1121,27 +1115,20 @@ main(int argc, char **argv)
> getTableData(&dopt, tblinfo, numTables, RELKIND_SEQUENCE);
>
> /*
> - * For binary upgrade mode, dump pg_largeobject_metadata and the
> - * associated pg_shdepend rows. This is faster to restore than the
> - * equivalent set of large object commands. We can only do this for
> - * upgrades from v12 and newer; in older versions, pg_largeobject_metadata
> - * was created WITH OIDS, so the OID column is hidden and won't be dumped.
> + * For binary upgrade mode, dump the pg_shdepend rows for large objects
> + * and maybe even pg_largeobject_metadata (see comment below for details).
> + * This is faster to restore than the equivalent set of large object
> + * commands. We can only do this for upgrades from v12 and newer; in
> + * older versions, pg_largeobject_metadata was created WITH OIDS, so the
> + * OID column is hidden and won't be dumped.
> */
It's not really related to this change, but what is that WITH OIDS bit about?
Sure, they aren't shown by default, but all it takes to change that is to
explicitly add the output column? I'm not saying we have to do that, I just
don't understand the reasoning as written here.
> @@ -3979,7 +3964,25 @@ getLOs(Archive *fout)
> appendPQExpBufferStr(loQry,
> "SELECT oid, lomowner, lomacl, "
> "acldefault('L', lomowner) AS acldefault "
> - "FROM pg_largeobject_metadata "
> + "FROM pg_largeobject_metadata ");
> +
> + /*
> + * For upgrades from v12 or newer, we transfer pg_largeobject_metadata via
*binary upgrades.
The improvements in pg_upgrade time and memory in a cluster with 50M LOs [1]
are really quite impressive, even if upgrading from < 16. It's rare to improve
memory usage by several orders of magnitude.
Greetings,
Andres Freund
[1] c=20;pgbench -n -c$c -j$c -P1 -t $((50000000 / $c)) -f <(echo -e "SELECT lo_create(0) AS loid\n") 50m_los
On Sun, Feb 08, 2026 at 04:26:41PM -0500, Andres Freund wrote: > On 2026-02-05 15:29:55 -0600, Nathan Bossart wrote: >> + * commands. We can only do this for upgrades from v12 and newer; in >> + * older versions, pg_largeobject_metadata was created WITH OIDS, so the >> + * OID column is hidden and won't be dumped. >> */ > > It's not really related to this change, but what is that WITH OIDS bit about? > Sure, they aren't shown by default, but all it takes to change that is to > explicitly add the output column? I'm not saying we have to do that, I just > don't understand the reasoning as written here. IIRC the issue is that getTableAttrs() won't pick up the OID column on older versions. It might be easy to fix that by adjusting its query for binary upgrades from <v12. That could be worth doing, if for no other reason than to simplify some of the pg_dump code. I'll make a note of it. > The improvements in pg_upgrade time and memory in a cluster with 50M LOs [1] > are really quite impressive, even if upgrading from < 16. It's rare to improve > memory usage by several orders of magnitude. Nice! -- nathan
Here is what I have staged for commit, which I'm planning to do this afternoon. -- nathan
Вложения
On Mon, Feb 09, 2026 at 11:05:50AM -0600, Nathan Bossart wrote: > Here is what I have staged for commit, which I'm planning to do this > afternoon. Committed. -- nathan
On Sun, Feb 08, 2026 at 04:00:40PM -0600, Nathan Bossart wrote: > On Sun, Feb 08, 2026 at 04:26:41PM -0500, Andres Freund wrote: >> On 2026-02-05 15:29:55 -0600, Nathan Bossart wrote: >>> + * commands. We can only do this for upgrades from v12 and newer; in >>> + * older versions, pg_largeobject_metadata was created WITH OIDS, so the >>> + * OID column is hidden and won't be dumped. >>> */ >> >> It's not really related to this change, but what is that WITH OIDS bit about? >> Sure, they aren't shown by default, but all it takes to change that is to >> explicitly add the output column? I'm not saying we have to do that, I just >> don't understand the reasoning as written here. > > IIRC the issue is that getTableAttrs() won't pick up the OID column on > older versions. It might be easy to fix that by adjusting its query for > binary upgrades from <v12. That could be worth doing, if for no other > reason than to simplify some of the pg_dump code. I'll make a note of it. This was a little more painful than I expected, but this seems to be what is required to allow COPY-ing pg_largeobject_metadata during binary upgrades from < v12. -- nathan
Вложения
On Wed, Feb 11, 2026 at 03:00:51PM -0600, Nathan Bossart wrote: > This was a little more painful than I expected, but this seems to be what > is required to allow COPY-ing pg_largeobject_metadata during binary > upgrades from < v12. I don't think there's anything terribly controversial here, so I'd like to proceed with committing this in the next couple of days if there are no objections. -- nathan
Hi,
On 2026-02-11 15:00:51 -0600, Nathan Bossart wrote:
> On Sun, Feb 08, 2026 at 04:00:40PM -0600, Nathan Bossart wrote:
> > IIRC the issue is that getTableAttrs() won't pick up the OID column on
> > older versions. It might be easy to fix that by adjusting its query for
> > binary upgrades from <v12. That could be worth doing, if for no other
> > reason than to simplify some of the pg_dump code. I'll make a note of it.
>
> This was a little more painful than I expected, but this seems to be what
> is required to allow COPY-ing pg_largeobject_metadata during binary
> upgrades from < v12.
Nice!
> @@ -2406,11 +2404,14 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
> column_list = fmtCopyColumnList(tbinfo, clistBuf);
>
> /*
> - * Use COPY (SELECT ...) TO when dumping a foreign table's data, and when
> - * a filter condition was specified. For other cases a simple COPY
> - * suffices.
> + * Use COPY (SELECT ...) TO when dumping a foreign table's data, when a
> + * filter condition was specified, and when in binary upgrade mode and
> + * dumping an old pg_largeobject_metadata defined WITH OIDS. For other
> + * cases a simple COPY suffices.
> */
> - if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
> + if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
> + (fout->dopt->binary_upgrade && fout->remoteVersion < 120000 &&
> + tbinfo->dobj.catId.oid == LargeObjectMetadataRelationId))
> {
> /* Temporary allows to access to foreign tables to dump data */
> if (tbinfo->relkind == RELKIND_FOREIGN_TABLE)
Not really the fault of this patch, but it seems somewhat grotty that this has
binary upgrade specific code in this place. I was certainly confused when
first trying to use pg_dump in binary upgrade mode with large objects, because
no data was dumped when using plain text mode, which is what I had been using
for simplicity...
> @@ -2406,11 +2404,14 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
> column_list = fmtCopyColumnList(tbinfo, clistBuf);
>
> /*
> - * Use COPY (SELECT ...) TO when dumping a foreign table's data, and when
> - * a filter condition was specified. For other cases a simple COPY
> - * suffices.
> + * Use COPY (SELECT ...) TO when dumping a foreign table's data, when a
> + * filter condition was specified, and when in binary upgrade mode and
> + * dumping an old pg_largeobject_metadata defined WITH OIDS. For other
> + * cases a simple COPY suffices.
> */
> - if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
> + if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
> + (fout->dopt->binary_upgrade && fout->remoteVersion < 120000 &&
> + tbinfo->dobj.catId.oid == LargeObjectMetadataRelationId))
> {
> /* Temporary allows to access to foreign tables to dump data */
> if (tbinfo->relkind == RELKIND_FOREIGN_TABLE)
I guess you could instead generate a COPY using WITH OIDS. But it's probably
not worth having that path, given we already need to support COPY (SELECT ..).
OTOH, I think it'd perhaps avoid needing to deal with this:
> @@ -9442,7 +9428,18 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
> "(pt.classoid = co.tableoid AND pt.objoid = co.oid)\n");
>
> appendPQExpBufferStr(q,
> - "WHERE a.attnum > 0::pg_catalog.int2\n"
> + "WHERE a.attnum > 0::pg_catalog.int2\n");
> +
> + /*
> + * For binary upgrades from <v12, be sure to pick up
> + * pg_largeobject_metadata's oid column.
> + */
> + if (fout->dopt->binary_upgrade && fout->remoteVersion < 120000)
> + appendPQExpBufferStr(q,
> + "OR (a.attnum = -2::pg_catalog.int2 AND src.tbloid = "
> + CppAsString2(LargeObjectMetadataRelationId) ")\n");
> +
as we'd just include the oid column without needing to somehow include it in
the attribute list.
> @@ -9544,7 +9541,9 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
>
> for (int j = 0; j < numatts; j++, r++)
> {
> - if (j + 1 != atoi(PQgetvalue(res, r, i_attnum)))
> + if (j + 1 != atoi(PQgetvalue(res, r, i_attnum)) &&
> + !(fout->dopt->binary_upgrade && fout->remoteVersion < 120000 &&
> + tbinfo->dobj.catId.oid == LargeObjectMetadataRelationId))
> pg_fatal("invalid column numbering in table \"%s\"",
> tbinfo->dobj.name);
> tbinfo->attnames[j] = pg_strdup(PQgetvalue(res, r, i_attname));
> --
> 2.50.1 (Apple Git-155)
I guess WITH OIDs also would avoid the need for this.
Greetings,
Andres Freund
On Thu, Feb 12, 2026 at 04:46:30PM -0500, Andres Freund wrote: > On 2026-02-11 15:00:51 -0600, Nathan Bossart wrote: >> This was a little more painful than I expected, but this seems to be what >> is required to allow COPY-ing pg_largeobject_metadata during binary >> upgrades from < v12. > > Nice! Thanks for looking. > Not really the fault of this patch, but it seems somewhat grotty that this has > binary upgrade specific code in this place. I was certainly confused when > first trying to use pg_dump in binary upgrade mode with large objects, because > no data was dumped when using plain text mode, which is what I had been using > for simplicity... Yeah, this has been an annoyance while hacking in this area. I haven't looked into what it'd take to fix it, though. > I guess you could instead generate a COPY using WITH OIDS. But it's probably > not worth having that path, given we already need to support COPY (SELECT ..). Ah, I forgot all about that COPY option... > as we'd just include the oid column without needing to somehow include it in > the attribute list. > > [...] > > I guess WITH OIDs also would avoid the need for this. Will give it a try. -- nathan
On Thu, Feb 12, 2026 at 03:59:24PM -0600, Nathan Bossart wrote: > On Thu, Feb 12, 2026 at 04:46:30PM -0500, Andres Freund wrote: >> I guess you could instead generate a COPY using WITH OIDS. But it's probably >> not worth having that path, given we already need to support COPY (SELECT ..). > > Ah, I forgot all about that COPY option... > >> as we'd just include the oid column without needing to somehow include it in >> the attribute list. >> >> [...] >> >> I guess WITH OIDs also would avoid the need for this. > > Will give it a try. So, in addition to hacking in the OIDS option to the COPY command executed on the old server, we still need to hack the OID column into the column list for the dumped COPY command (or omit the column list entirely for pg_largeobject_metadata). That seems to result in roughly the same level of hackery as before. -- nathan
Committed. -- nathan
On 2026-02-16 15:15:42 -0600, Nathan Bossart wrote: > Committed. Thanks for working on this! I think the situation now is a lot better than it was in 18...
On Mon, Feb 16, 2026 at 04:19:10PM -0500, Andres Freund wrote: > Thanks for working on this! I think the situation now is a lot better than it > was in 18... I hope so... I appreciate the pointers/reviews. -- nathan