Обсуждение: Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

Поиск
Список
Период
Сортировка

Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

От
Jeevan Chalke
Дата:

Hello Hackers,

We have identified a dependency issue—most notably observed with the PostGIS extension—where a table's column definition relies on data existing in another table's catalog at restore time. Because pg_dump typically separates schema and data into distinct sections, these implicit data-level dependencies are not captured, leading to failures during pg_upgrade or pg_restore.

Jakub Wartak previously reported a detailed example of this issue here: https://www.postgresql.org/message-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q%40mail.gmail.com

Following a discussion with Alvaro Herrera, we have developed a patch based on his suggestions.

The Problem

In certain extension-heavy schemas, an object's schema definition cannot be created unless another table's data is already populated. Current pg_dump logic handles schema-to-schema dependencies via pg_depend, but it lacks a mechanism to:

  1. Enforce a specific order for dependencies not recorded in pg_depend.

  2. Interleave data loading with schema creation for specific tables.

Proposed Solution

This patch introduces a new option, --extra-dependencies, for pg_dump and pg_upgrade.

1. Dependency Hinting: The option allows users to provide manual dependency hints in the format table#referenced_table. Internally, pg_dump treats these as "faked" dependencies, ensuring the referenced table is sorted before the dependent object in the dump graph, similar to standard pg_depend entries.

2. Immediate Data Dumping: To satisfy the data-level requirement, the patch ensures that any table referenced via this option has its data dumped immediately following its definition (utilizing the dumpTableData() infrastructure) rather than in the general DATA section of the dump.

Use Case

While this is a specialized tool intended for unrecorded dependencies (like those in PostGIS), it provides a necessary safety valve for migrations that currently require manual intervention or complex workarounds.

Attached are the patches (PoC) for review. We look forward to your thoughts and suggestions.

Regards,


--
Jeevan Chalke
Principal Engineer, Engineering Manager
Product Development


enterprisedb.com
Вложения

Re: Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

От
Matthias van de Meent
Дата:
On Wed, 24 Dec 2025 at 12:45, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
>
> Hello Hackers,
>
> We have identified a dependency issue—most notably observed with the PostGIS extension—where a table's column
definitionrelies on data existing in another table's catalog at restore time. Because pg_dump typically separates
schemaand data into distinct sections, these implicit data-level dependencies are not captured, leading to failures
duringpg_upgrade or pg_restore. 
>
> Jakub Wartak previously reported a detailed example of this issue here:
https://www.postgresql.org/message-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q%40mail.gmail.com

Ah, yes, that does sound like an issue.

> Following a discussion with Alvaro Herrera, we have developed a patch based on his suggestions.
>
> The Problem
>
> In certain extension-heavy schemas, an object's schema definition cannot be created unless another table's data is
alreadypopulated. Current pg_dump logic handles schema-to-schema dependencies via pg_depend, but it lacks a mechanism
to:
>
> Enforce a specific order for dependencies not recorded in pg_depend.
> Interleave data loading with schema creation for specific tables.

Is there something that prevents PostGIS from recording this kind of
dependency in pg_depend, and by doing so force the right order in
pg_dump? It seems to me that pg_depend's model is generic enough to
enable that kind of dependency; so is the issue that pg_dump doesn't
currently track and resolve that type of dependency in a satisfactory
manner?

I'm personally not a big fan of new pg_dump and pg_upgrade options to
solve this, as they require a user input to register a dependency that
should've been stored in the catalog; it should've been handled
natively. So, if we could make it work using pg_depend instead of
expecting user input here, then that'd be very much appreciated.


Kind regards,

Matthias van de Meent



Re: Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

От
Jeevan Chalke
Дата:


On Thu, Dec 25, 2025 at 2:22 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Wed, 24 Dec 2025 at 12:45, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
>
> Hello Hackers,
>
> We have identified a dependency issue—most notably observed with the PostGIS extension—where a table's column definition relies on data existing in another table's catalog at restore time. Because pg_dump typically separates schema and data into distinct sections, these implicit data-level dependencies are not captured, leading to failures during pg_upgrade or pg_restore.
>
> Jakub Wartak previously reported a detailed example of this issue here: https://www.postgresql.org/message-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q%40mail.gmail.com

Ah, yes, that does sound like an issue.

> Following a discussion with Alvaro Herrera, we have developed a patch based on his suggestions.
>
> The Problem
>
> In certain extension-heavy schemas, an object's schema definition cannot be created unless another table's data is already populated. Current pg_dump logic handles schema-to-schema dependencies via pg_depend, but it lacks a mechanism to:
>
> Enforce a specific order for dependencies not recorded in pg_depend.
> Interleave data loading with schema creation for specific tables.

Is there something that prevents PostGIS from recording this kind of
dependency in pg_depend, and by doing so force the right order in
pg_dump? It seems to me that pg_depend's model is generic enough to
enable that kind of dependency; so is the issue that pg_dump doesn't
currently track and resolve that type of dependency in a satisfactory
manner?

I'm personally not a big fan of new pg_dump and pg_upgrade options to
solve this, as they require a user input to register a dependency that
should've been stored in the catalog; it should've been handled
natively. So, if we could make it work using pg_depend instead of
expecting user input here, then that'd be very much appreciated.


Thanks for the feedback, Matthias; I agree with your assessment. Currently, Postgres lacks a native mechanism for tracking dependencies between a table and the specific rows of another table. While certain extensions like PostGIS introduce these patterns, they remain non-standard edge cases.

Implementing a fix in the core backend seems like overkill for this scenario. Since the failure is specific to the upgrade path, targeting pg_dump and pg_upgrade is a significantly less invasive approach. Notably, this patch triggers an immediate dump of the referenced table data -- an unconventional behavior that is better handled in the client binaries than in the backend. Consequently, this approach would require new options for these binaries to explicitly inject those dependency details.
 

Kind regards,

Matthias van de Meent


Regards,

--
Jeevan Chalke
Principal Engineer, Engineering Manager
Product Development


enterprisedb.com