Обсуждение: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root
[PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root
От
Mikhail Kharitonov
Дата:
Hi hackers, An inconsistency was observed when using logical replication on partitioned tables with the option `publish_via_partition_root = true`: if REPLICA IDENTITY FULL is set only on the parent table, but not on all partitions, logical decoding emits UPDATE and DELETE messages with tag 'O' (old tuple) even for partitions that do not have full replica identity. In those cases, only the primary key columns are included in the message, which contradicts the expected meaning of 'O' and violates the logical replication message protocol: https://www.postgresql.org/docs/current/protocol-logicalrep-message-formats.html This can cause issues in downstream consumers, which interpret the 'O' tag as implying that a full tuple is present. The attached patch resolves the inconsistency by selecting the correct tuple type ('O' vs 'K') based on the replica identity of the actual leaf relation being published, rather than using the setting of the root relation alone. As a result, the format of logical replication messages aligns with the semantics defined by the protocol. Steps to reproduce: 1. Create a partitioned table with REPLICA IDENTITY FULL on the parent and only one of the partitions. 2. Create a publication with `publish_via_partition_root = true`. 3. Perform INSERT, UPDATE, DELETE operations through the root table. 4. Observe via `pg_recvlogical` that for a partition without full replica identity, the logical replication stream contains 'O' records with only key fields. After applying the patch, 'O' is used only when the full row is available, and 'K' is used otherwise - as expected. This patch is based on the current `master` branch as of commit: b3754dcc9ff Best regards, Mikhail Kharitonov
Вложения
Hi!
This is probably not the most familiar part of Postgres to me, but does it break anything? Or is it just inconsistency in the replication protocol?
A test for the described scenario would be a great addition. And, if it is feasible, provide an example of what would be broken with the way partitioned tables are replicated now.
There is a chance that the replication protocol for partitioned tables needs to be rewritten, and I sincerely hope that I am wrong about this. It seems Alvaro Herrera tried this here [0].
[0] https://www.postgresql.org/message-id/201902041630.gpadougzab7v@alvherre.pgsql
--
Best regards,
Maxim Orlov.
Re: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root
От
Mikhail Kharitonov
Дата:
Hi, Thank you for the feedback. I would like to clarify that the current behavior does not break replication between PostgreSQL instances. The logical replication stream is still accepted by the subscriber, and the data is applied correctly. However, the protocol semantics are violated, which may cause issues for external systems that rely on interpreting this stream. When using publish_via_partition_root = true and setting REPLICA IDENTITY FULL only on the parent table (but not on all partitions), logical replication generates messages with the tag 'O' (old tuple) for updates and deletes even for partitions that do not have full identity configured. In those cases, only key columns are sent, and the rest of the tuple is omitted. This contradicts the meaning of tag 'O', which, according to the documentation [1], indicates that the full old tuple is included. This behavior is safe for the standard PostgreSQL subscriber, which does not rely on the tag when applying changes. However, third-party tools that consume the logical replication stream and follow the protocol strictly can be misled. For example, one of our clients uses a custom CDC mechanism that extracts changes and sends them to Oracle. Their handler interprets the 'O' tag as a signal that the full old row is available. When it is not - the data is processed incorrectly. The attached patch changes the behavior so that the 'O' or 'K' tag is chosen based on the REPLICA IDENTITY setting of the actual partition where the row ends up not only the parent. - If the partition has REPLICA IDENTITY FULL, the full tuple is sent and tagged 'O'. - Otherwise, only the key columns are sent, and the tag 'K' is used. This aligns the behavior with the protocol documentation. I have also included a TAP test: 036_partition_replica_identity.pl, located in src/test/subscription/t/ It demonstrates two cases: - An update/delete on a partition with REPLICA IDENTITY FULL correctly emits an 'O' tag with the full old row. - An update/delete on a partition without REPLICA IDENTITY FULL currently also emits an 'O' tag, but only with key fields - this is the problem. After applying the patch, the second case correctly uses the 'K' tag. This patch is a minimal change it does not alter protocol structure or introduce new behavior. It only ensures the implementation matches the documentation. In the future, we might consider a broader redesign of logical replication for partitioned tables (see [2]), but this is a narrow fix that solves a real inconsistency. Looking forward to your comments. Best regards, Mikhail Kharitonov [1] https://www.postgresql.org/docs/current/protocol-logicalrep-message-formats.html [2] https://www.postgresql.org/message-id/201902041630.gpadougzab7v@alvherre.pgsql On Mon, May 12, 2025 at 5:25 PM Maxim Orlov <orlovmg@gmail.com> wrote: > > Hi! > > This is probably not the most familiar part of Postgres to me, but does it break anything? Or is it just inconsistencyin the replication protocol? > > A test for the described scenario would be a great addition. And, if it is feasible, provide an example of what would bebroken with the way partitioned tables are replicated now. > > There is a chance that the replication protocol for partitioned tables needs to be rewritten, and I sincerely hope thatI am wrong about this. It seems Alvaro Herrera tried this here [0]. > > > [0] https://www.postgresql.org/message-id/201902041630.gpadougzab7v@alvherre.pgsql > > > -- > Best regards, > Maxim Orlov.
Вложения
Re: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root
От
Mikhail Kharitonov
Дата:
Hi all, I’m sending v2 of the patch. This is a clean rebase onto current master (commit a27893df45e) and a squash of the fix together with the TAP test into a single patch file. I would appreciate your thoughts and comments on the current problem. Thank you! -- Best regards, Mikhail Kharitonov On Thu, May 29, 2025 at 9:30 AM Mikhail Kharitonov <mikhail.kharitonov.dev@gmail.com> wrote: > > Hi, > > Thank you for the feedback. > > I would like to clarify that the current behavior does not break replication > between PostgreSQL instances. The logical replication stream is still accepted > by the subscriber, and the data is applied correctly. However, the protocol > semantics are violated, which may cause issues for external systems that rely > on interpreting this stream. > > When using publish_via_partition_root = true and setting REPLICA IDENTITY FULL > only on the parent table (but not on all partitions), logical replication > generates messages with the tag 'O' (old tuple) for updates and deletes even > for partitions that do not have full identity configured. > > In those cases, only key columns are sent, and the rest of the tuple is omitted. > This contradicts the meaning of tag 'O', which, according > to the documentation [1], indicates that the full old tuple is included. > > This behavior is safe for the standard PostgreSQL subscriber, which does not > rely on the tag when applying changes. However, third-party tools that consume > the logical replication stream and follow the protocol strictly can be misled. > For example, one of our clients uses a custom CDC mechanism that extracts > changes and sends them to Oracle. Their handler interprets the 'O' tag as a > signal that the full old row is available. When it is not - the data is > processed incorrectly. > > The attached patch changes the behavior so that the 'O' or 'K' tag is chosen > based on the REPLICA IDENTITY setting of the actual partition where the row > ends up not only the parent. > - If the partition has REPLICA IDENTITY FULL, the full tuple is > sent and tagged 'O'. > - Otherwise, only the key columns are sent, and the tag 'K' is used. > > This aligns the behavior with the protocol documentation. > I have also included a TAP test: 036_partition_replica_identity.pl, > located in src/test/subscription/t/ > > It demonstrates two cases: > - An update/delete on a partition with REPLICA IDENTITY FULL correctly > emits an 'O' tag with the full old row. > - An update/delete on a partition without REPLICA IDENTITY FULL currently > also emits an 'O' tag, but only with key fields - this is the problem. > > After applying the patch, the second case correctly uses the 'K' tag. > > This patch is a minimal change it does not alter protocol structure > or introduce new behavior. It only ensures the implementation matches > the documentation. In the future, we might consider a broader redesign > of logical replication for partitioned tables (see [2]), but this is > a narrow fix that solves a real inconsistency. > > Looking forward to your comments. > > Best regards, > Mikhail Kharitonov > > [1] https://www.postgresql.org/docs/current/protocol-logicalrep-message-formats.html > [2] https://www.postgresql.org/message-id/201902041630.gpadougzab7v@alvherre.pgsql > > On Mon, May 12, 2025 at 5:25 PM Maxim Orlov <orlovmg@gmail.com> wrote: > > > > Hi! > > > > This is probably not the most familiar part of Postgres to me, but does it break anything? Or is it just inconsistencyin the replication protocol? > > > > A test for the described scenario would be a great addition. And, if it is feasible, provide an example of what wouldbe broken with the way partitioned tables are replicated now. > > > > There is a chance that the replication protocol for partitioned tables needs to be rewritten, and I sincerely hope thatI am wrong about this. It seems Alvaro Herrera tried this here [0]. > > > > > > [0] https://www.postgresql.org/message-id/201902041630.gpadougzab7v@alvherre.pgsql > > > > > > -- > > Best regards, > > Maxim Orlov.
Re: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root
От
Mikhail Kharitonov
Дата:
Hi all, I've rebased this series onto the latest master. Changes in v3: Patch 1/2 adds two new functions: logicalrep_write_update_extended, logicalrep_write_delete_extended to logicalproto. These are now used in pgoutput and allow correct old-tuple flag handling when publish_via_partition_root = true. The old functions remain as wrappers to preserve compatibility. A short documentation note was added to explain the new behaviour. Patch 2/2 moves the TAP test into a separate commit, so the code change and test are isolated. -- Best regards, Mikhail Kharitonov