Обсуждение: Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade
Hello Hackers,
We have identified a dependency issue—most notably observed with the PostGIS extension—where a table's column definition relies on data existing in another table's catalog at restore time. Because pg_dump typically separates schema and data into distinct sections, these implicit data-level dependencies are not captured, leading to failures during pg_upgrade or pg_restore.
Jakub Wartak previously reported a detailed example of this issue here: https://www.postgresql.org/message-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q%40mail.gmail.com
Following a discussion with Alvaro Herrera, we have developed a patch based on his suggestions.
The Problem
In certain extension-heavy schemas, an object's schema definition cannot be created unless another table's data is already populated. Current pg_dump logic handles schema-to-schema dependencies via pg_depend, but it lacks a mechanism to:
Enforce a specific order for dependencies not recorded in
pg_depend.Interleave data loading with schema creation for specific tables.
Proposed Solution
This patch introduces a new option, --extra-dependencies, for pg_dump and pg_upgrade.
1. Dependency Hinting: The option allows users to provide manual dependency hints in the format table#referenced_table. Internally, pg_dump treats these as "faked" dependencies, ensuring the referenced table is sorted before the dependent object in the dump graph, similar to standard pg_depend entries.
2. Immediate Data Dumping: To satisfy the data-level requirement, the patch ensures that any table referenced via this option has its data dumped immediately following its definition (utilizing the dumpTableData() infrastructure) rather than in the general DATA section of the dump.
Use Case
While this is a specialized tool intended for unrecorded dependencies (like those in PostGIS), it provides a necessary safety valve for migrations that currently require manual intervention or complex workarounds.
Attached are the patches (PoC) for review. We look forward to your thoughts and suggestions.
Regards,

Вложения
Re: Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade
On Wed, 24 Dec 2025 at 12:45, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: > > Hello Hackers, > > We have identified a dependency issue—most notably observed with the PostGIS extension—where a table's column definitionrelies on data existing in another table's catalog at restore time. Because pg_dump typically separates schemaand data into distinct sections, these implicit data-level dependencies are not captured, leading to failures duringpg_upgrade or pg_restore. > > Jakub Wartak previously reported a detailed example of this issue here: https://www.postgresql.org/message-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q%40mail.gmail.com Ah, yes, that does sound like an issue. > Following a discussion with Alvaro Herrera, we have developed a patch based on his suggestions. > > The Problem > > In certain extension-heavy schemas, an object's schema definition cannot be created unless another table's data is alreadypopulated. Current pg_dump logic handles schema-to-schema dependencies via pg_depend, but it lacks a mechanism to: > > Enforce a specific order for dependencies not recorded in pg_depend. > Interleave data loading with schema creation for specific tables. Is there something that prevents PostGIS from recording this kind of dependency in pg_depend, and by doing so force the right order in pg_dump? It seems to me that pg_depend's model is generic enough to enable that kind of dependency; so is the issue that pg_dump doesn't currently track and resolve that type of dependency in a satisfactory manner? I'm personally not a big fan of new pg_dump and pg_upgrade options to solve this, as they require a user input to register a dependency that should've been stored in the catalog; it should've been handled natively. So, if we could make it work using pg_depend instead of expecting user input here, then that'd be very much appreciated. Kind regards, Matthias van de Meent
Re: Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade
On Wed, 24 Dec 2025 at 12:45, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
>
> Hello Hackers,
>
> We have identified a dependency issue—most notably observed with the PostGIS extension—where a table's column definition relies on data existing in another table's catalog at restore time. Because pg_dump typically separates schema and data into distinct sections, these implicit data-level dependencies are not captured, leading to failures during pg_upgrade or pg_restore.
>
> Jakub Wartak previously reported a detailed example of this issue here: https://www.postgresql.org/message-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q%40mail.gmail.com
Ah, yes, that does sound like an issue.
> Following a discussion with Alvaro Herrera, we have developed a patch based on his suggestions.
>
> The Problem
>
> In certain extension-heavy schemas, an object's schema definition cannot be created unless another table's data is already populated. Current pg_dump logic handles schema-to-schema dependencies via pg_depend, but it lacks a mechanism to:
>
> Enforce a specific order for dependencies not recorded in pg_depend.
> Interleave data loading with schema creation for specific tables.
Is there something that prevents PostGIS from recording this kind of
dependency in pg_depend, and by doing so force the right order in
pg_dump? It seems to me that pg_depend's model is generic enough to
enable that kind of dependency; so is the issue that pg_dump doesn't
currently track and resolve that type of dependency in a satisfactory
manner?
I'm personally not a big fan of new pg_dump and pg_upgrade options to
solve this, as they require a user input to register a dependency that
should've been stored in the catalog; it should've been handled
natively. So, if we could make it work using pg_depend instead of
expecting user input here, then that'd be very much appreciated.
Thanks for the feedback, Matthias; I agree with your assessment. Currently, Postgres lacks a native mechanism for tracking dependencies between a table and the specific rows of another table. While certain extensions like PostGIS introduce these patterns, they remain non-standard edge cases.
pg_dump and pg_upgrade is a significantly less invasive approach. Notably, this patch triggers an immediate dump of the referenced table data -- an unconventional behavior that is better handled in the client binaries than in the backend. Consequently, this approach would require new options for these binaries to explicitly inject those dependency details.
Kind regards,
Matthias van de Meent
