Обсуждение: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions
Hi,
While working with logical replication and partitioned tables, I noticed an inconsistency between how publications treat partitions and how "ALTER TABLE ... REPLICA IDENTITY" behaves.
When a publication is created on a partitioned table, e.g.:
PostgreSQL automatically includes all leaf partitions of the table in the publication. This matches the user’s expectation that a partitioned table behaves as a single logical entity.
This gap leads to several problems:
To address this, the attached patch makes:
```
This aligns ALTER TABLE behavior with how publications already expand partitioned tables, and makes replication identity configuration consistent with logical replication semantics.
The attached patch is not yet fully ready for detailed review, this is more of a proof-of-concept. At this stage, I mainly want to see whether people agree with the idea, or if there are objections to cascading replica identity changes for partitioned tables before I refine the patch further.
Comments and feedback are welcome.
While working with logical replication and partitioned tables, I noticed an inconsistency between how publications treat partitions and how "ALTER TABLE ... REPLICA IDENTITY" behaves.
When a publication is created on a partitioned table, e.g.:
```
CREATE PUBLICATION pub FOR TABLE parent;
```
CREATE PUBLICATION pub FOR TABLE parent;
```
PostgreSQL automatically includes all leaf partitions of the table in the publication. This matches the user’s expectation that a partitioned table behaves as a single logical entity.
However, if the user then runs:
```
```
ALTER TABLE parent REPLICA IDENTITY FULL;
```
```
only the parent table’s relreplident is updated. None of the leaf partitions inherit this change, even though the parent itself has no storage and its replication identity plays no role in logical replication. Logical decoding always operates on the leaf partitions, and their replication identities determine whether UPDATE/DELETE can be replicated safely.
This gap leads to several problems:
* The parent table’s replica identity is effectively irrelevant during logical replication, since it never stores tuples or produces WAL.
* Users expect that altering the replica identity on the partitioned table would apply to all partitions that are implicitly included in the publication.
* As a result, users currently need to run ALTER TABLE ... REPLICA IDENTITY separately for every leaf partition, which is tedious and error-prone on large partition hierarchies.
* Misconfiguration usually surfaces only when logical replication starts failing on UPDATE/DELETE for specific leaf partitions due to mismatched replica identities.
```
ALTER TABLE parent REPLICA IDENTITY <type>
```
```
cascade the new setting to all leaf partitions of the table. Partitioned tables (RELKIND_PARTITIONED_TABLE) are skipped since they have no storage and no effective replica identity.
This aligns ALTER TABLE behavior with how publications already expand partitioned tables, and makes replication identity configuration consistent with logical replication semantics.
The attached patch is not yet fully ready for detailed review, this is more of a proof-of-concept. At this stage, I mainly want to see whether people agree with the idea, or if there are objections to cascading replica identity changes for partitioned tables before I refine the patch further.
Comments and feedback are welcome.
Вложения
On Thu, Dec 11, 2025 at 2:46 PM Chao Li <li.evan.chao@gmail.com> wrote: > > Hi, > While working with logical replication and partitioned tables, I noticed an inconsistency between how publications treatpartitions and how "ALTER TABLE ... REPLICA IDENTITY" behaves. > > When a publication is created on a partitioned table, e.g.: > ``` > CREATE PUBLICATION pub FOR TABLE parent; > ``` > > PostgreSQL automatically includes all leaf partitions of the table in the publication. This matches the user’s expectationthat a partitioned table behaves as a single logical entity. > > However, if the user then runs: > ``` > ALTER TABLE parent REPLICA IDENTITY FULL; > ``` > only the parent table’s relreplident is updated. None of the leaf partitions inherit this change, even though the parentitself has no storage and its replication identity plays no role in logical replication. Logical decoding always operateson the leaf partitions, and their replication identities determine whether UPDATE/DELETE can be replicated safely. > > This gap leads to several problems: > > * The parent table’s replica identity is effectively irrelevant during logical replication, since it never stores tuplesor produces WAL. > When we use row filters, if publish_via_partition_root option of publication is true, the root partitioned table's row filter is used. I think this would then refer RI of partitioned table for validity of row filter. Please see docs [1] (There can be a case where a subscription combines multiple publications. If a partitioned table is published by any subscribed publications which set publish_via_partition_root = true, changes on this partitioned table (or on its partitions) will be published using the identity and schema of this partitioned table rather than that of the individual partitions. This parameter also affects how row filters and column lists are chosen for partitions; see below for details.) for more details. I have not tested it but you can once try to see how it behaves. The other point is what if one of the partition already has RI defined to a different value than what is defined for parent table? [1] - https://www.postgresql.org/docs/devel/sql-createpublication.html -- With Regards, Amit Kapila.
> On Dec 11, 2025, at 20:43, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Dec 11, 2025 at 2:46 PM Chao Li <li.evan.chao@gmail.com> wrote: >> >> Hi, >> While working with logical replication and partitioned tables, I noticed an inconsistency between how publications treatpartitions and how "ALTER TABLE ... REPLICA IDENTITY" behaves. >> >> When a publication is created on a partitioned table, e.g.: >> ``` >> CREATE PUBLICATION pub FOR TABLE parent; >> ``` >> >> PostgreSQL automatically includes all leaf partitions of the table in the publication. This matches the user’s expectationthat a partitioned table behaves as a single logical entity. >> >> However, if the user then runs: >> ``` >> ALTER TABLE parent REPLICA IDENTITY FULL; >> ``` >> only the parent table’s relreplident is updated. None of the leaf partitions inherit this change, even though the parentitself has no storage and its replication identity plays no role in logical replication. Logical decoding always operateson the leaf partitions, and their replication identities determine whether UPDATE/DELETE can be replicated safely. >> >> This gap leads to several problems: >> >> * The parent table’s replica identity is effectively irrelevant during logical replication, since it never stores tuplesor produces WAL. >> > > When we use row filters, if publish_via_partition_root option of > publication is true, the root partitioned table's row filter is used. > I think this would then refer RI of partitioned table for validity of > row filter. Please see docs [1] (There can be a case where a > subscription combines multiple publications. If a partitioned table is > published by any subscribed publications which set > publish_via_partition_root = true, changes on this partitioned table > (or on its partitions) will be published using the identity and schema > of this partitioned table rather than that of the individual > partitions. This parameter also affects how row filters and column > lists are chosen for partitions; see below for details.) for more > details. > > I have not tested it but you can once try to see how it behaves. > > The other point is what if one of the partition already has RI defined > to a different value than what is defined for parent table? > > [1] - https://www.postgresql.org/docs/devel/sql-createpublication.html > > -- > With Regards, > Amit Kapila. Hi Amit, Thanks for pointing out that my assumption of “RI of parent is not used” is not always true. I agree that automatic-cascade will introduce a lot of complexities. To ensure the backward-compatibility, how about to extendthe ALTER TABLE syntax like: ``` ALTER TABLE <root> REPLICA IDENTITY <type> [CASCADE | FORCE CASCADE] ``` So, that the current syntax will behave the same as usual, and With CASCADE ============ 1. Root's RI updated 2. All children (including middle partitioned tables and leaf tables) RI updated unless 3 3. If any child’s RI is different from the root's RI, fail out, no change happens With CASCADE FORCE =================== 1. Root's RI updated 2. All children (including middle partitioned tables and leaf tables) RI updated, prints a warning message when a child’sRI is different from root’s RI "ALTER TABLE parent REPLICA IDENTITY” is a PG specific syntax, so the change won’t break the SQL standard. And “CASCADE”is known keyword that has been used in many SQL commands. I can see the usefulness of “CASCADE” when a partitioned table has many partitions. A single command will be able to updateall partitions’ RI. What do you think? Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On Fri, Dec 12, 2025 at 9:28 AM Chao Li <li.evan.chao@gmail.com> wrote: > > Hi Amit, > > Thanks for pointing out that my assumption of “RI of parent is not used” is not always true. > > I agree that automatic-cascade will introduce a lot of complexities. To ensure the backward-compatibility, how about toextend the ALTER TABLE syntax like: > > ``` > ALTER TABLE <root> REPLICA IDENTITY <type> [CASCADE | FORCE CASCADE] > ``` > CASCADE is used for dependent objects, so I don't think using it will be appropriate in this context. However, the INHERIT (NO INHERIT) could be used. We already use them for constraints, see ALTER TABLE ... ALTER CONSTRAINT syntax in docs. > So, that the current syntax will behave the same as usual, and > > With CASCADE > ============ > 1. Root's RI updated > 2. All children (including middle partitioned tables and leaf tables) RI updated unless 3 > 3. If any child’s RI is different from the root's RI, fail out, no change happens > > With CASCADE FORCE > =================== > 1. Root's RI updated > 2. All children (including middle partitioned tables and leaf tables) RI updated, prints a warning message when a child’sRI is different from root’s RI > I think you can try to experiment with CHECK or NOT NULL constraint behavior for similar cases in case of partition tables. BTW, did you get this use case in the field or just browsing docs, you thought it would be useful to have such a feature? -- With Regards, Amit Kapila.
Hi Amit, Thank you very much for the guidance. > On Dec 13, 2025, at 19:10, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 12, 2025 at 9:28 AM Chao Li <li.evan.chao@gmail.com> wrote: >> >> Hi Amit, >> >> Thanks for pointing out that my assumption of “RI of parent is not used” is not always true. >> >> I agree that automatic-cascade will introduce a lot of complexities. To ensure the backward-compatibility, how about toextend the ALTER TABLE syntax like: >> >> ``` >> ALTER TABLE <root> REPLICA IDENTITY <type> [CASCADE | FORCE CASCADE] >> ``` >> > > CASCADE is used for dependent objects, so I don't think using it will > be appropriate in this context. However, the INHERIT (NO INHERIT) > could be used. We already use them for constraints, see ALTER TABLE > ... ALTER CONSTRAINT syntax in docs. > >> So, that the current syntax will behave the same as usual, and >> >> With CASCADE >> ============ >> 1. Root's RI updated >> 2. All children (including middle partitioned tables and leaf tables) RI updated unless 3 >> 3. If any child’s RI is different from the root's RI, fail out, no change happens >> >> With CASCADE FORCE >> =================== >> 1. Root's RI updated >> 2. All children (including middle partitioned tables and leaf tables) RI updated, prints a warning message when a child’sRI is different from root’s RI >> > > I think you can try to experiment with CHECK or NOT NULL constraint > behavior for similar cases in case of partition tables. > > BTW, did you get this use case in the field or just browsing docs, you > thought it would be useful to have such a feature? > The main problem I am trying to solve is [1], where you are already in the thread. This is a real pain point reported byour users and field teams. While working on [1], I noticed this additional issue during my own tests. I then discussedit with our field teams, and they confirmed that such a feature would be very helpful in practice. We have manydeployments where a single partitioned table has several thousands of partitions, and having a fast, single command toupdate the replica identity of all partitions would significantly simplify operations. I also confirmed one thing with the field teams: across our deployments (my company has 100K+ deployments in China), theyhave never seen a case where partitions under the same parent/root use different replica identities. In theory, thisis allowed, since RI can be set per partition, but in practice I am not sure whether such a use case really exists. Currently, replica identity is not inheritable for partitions. I verified this behavior: if I create a partitioned table,alter its RI to FULL, and then create a new partition, the new partition still uses DEFAULT. If we keep this behavior,we can easily run into a scenario like this: * create a partitioned table with 10 partitions * ALTER TABLE <parent> REPLICA IDENTITY FULL INHERIT -- assume this feature exists * create 5 new partitions * ALTER TABLE <parent> REPLICA IDENTITY FULL INHERIT -- conflict occurs In this case, a conflict occurs because the newly created partitions still have DEFAULT RI, but this is not the user’s intention. From this perspective, when a new partition is created, it should automatically inherit the parent’s RI. With that behavior,a “FORCE” option would rarely be needed, because having one partition use a different RI from its siblings shouldbe an uncommon case. Based on this, I imagine the feature roughly like this: * When a new partition is created, it inherits its parent’s RI * ALTER TABLE <table_name> REPLICA IDENTITY [ INHERIT | NO INHERIT ] -- error out on conflicts This leads to a couple of follow-up questions: * Should RI be switchable between “inheritable” and “non-inheritable”, similar to constraints? IMO, no. RI is much simplerthan constraints. For constraints, parent–child relationships exist between tables with potentially different structures,so it is natural that child tables might have different constraints. RI, however, only applies to partitionedtables, where partitions must share the same structure as the parent. In practice, it seems rare for partitionsto intentionally use a different RI than the parent. * Should publish_via_partition_root in publications affect this feature? IMO, no. A table can belong to multiple publications,and different publications may have different publish_via_partition_root settings. [1] https://postgr.es/m/CAEoWx2mMorbMwjKbT4YCsjDyL3r9Mp+z0bbK57VZ+OkJTgJQVQ@mail.gmail.com Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On Mon, Dec 15, 2025 at 12:44 PM Chao Li <li.evan.chao@gmail.com> wrote: > > Thank you very much for the guidance. > > > On Dec 13, 2025, at 19:10, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Dec 12, 2025 at 9:28 AM Chao Li <li.evan.chao@gmail.com> wrote: > >> > >> Hi Amit, > >> > >> Thanks for pointing out that my assumption of “RI of parent is not used” is not always true. > >> > >> I agree that automatic-cascade will introduce a lot of complexities. To ensure the backward-compatibility, how aboutto extend the ALTER TABLE syntax like: > >> > >> ``` > >> ALTER TABLE <root> REPLICA IDENTITY <type> [CASCADE | FORCE CASCADE] > >> ``` > >> > > > > CASCADE is used for dependent objects, so I don't think using it will > > be appropriate in this context. However, the INHERIT (NO INHERIT) > > could be used. We already use them for constraints, see ALTER TABLE > > ... ALTER CONSTRAINT syntax in docs. > > > >> So, that the current syntax will behave the same as usual, and > >> > >> With CASCADE > >> ============ > >> 1. Root's RI updated > >> 2. All children (including middle partitioned tables and leaf tables) RI updated unless 3 > >> 3. If any child’s RI is different from the root's RI, fail out, no change happens > >> > >> With CASCADE FORCE > >> =================== > >> 1. Root's RI updated > >> 2. All children (including middle partitioned tables and leaf tables) RI updated, prints a warning message when a child’sRI is different from root’s RI > >> > > > > I think you can try to experiment with CHECK or NOT NULL constraint > > behavior for similar cases in case of partition tables. > > > > BTW, did you get this use case in the field or just browsing docs, you > > thought it would be useful to have such a feature? > > > > The main problem I am trying to solve is [1], where you are already in the thread. This is a real pain point reported byour users and field teams. While working on [1], I noticed this additional issue during my own tests. I then discussedit with our field teams, and they confirmed that such a feature would be very helpful in practice. We have manydeployments where a single partitioned table has several thousands of partitions, and having a fast, single command toupdate the replica identity of all partitions would significantly simplify operations. > > I also confirmed one thing with the field teams: across our deployments (my company has 100K+ deployments in China), theyhave never seen a case where partitions under the same parent/root use different replica identities. In theory, thisis allowed, since RI can be set per partition, but in practice I am not sure whether such a use case really exists. > > Currently, replica identity is not inheritable for partitions. I verified this behavior: if I create a partitioned table,alter its RI to FULL, and then create a new partition, the new partition still uses DEFAULT. If we keep this behavior,we can easily run into a scenario like this: > > * create a partitioned table with 10 partitions > * ALTER TABLE <parent> REPLICA IDENTITY FULL INHERIT -- assume this feature exists > * create 5 new partitions > * ALTER TABLE <parent> REPLICA IDENTITY FULL INHERIT -- conflict occurs > > In this case, a conflict occurs because the newly created partitions still have DEFAULT RI, but this is not the user’sintention. > > From this perspective, when a new partition is created, it should automatically inherit the parent’s RI. With that behavior,a “FORCE” option would rarely be needed, because having one partition use a different RI from its siblings shouldbe an uncommon case. > > Based on this, I imagine the feature roughly like this: > > * When a new partition is created, it inherits its parent’s RI > * ALTER TABLE <table_name> REPLICA IDENTITY [ INHERIT | NO INHERIT ] -- error out on conflicts > Sounds reasonable to me. > This leads to a couple of follow-up questions: > > * Should RI be switchable between “inheritable” and “non-inheritable”, similar to constraints? IMO, no. RI is much simplerthan constraints. For constraints, parent–child relationships exist between tables with potentially different structures,so it is natural that child tables might have different constraints. RI, however, only applies to partitionedtables, where partitions must share the same structure as the parent. In practice, it seems rare for partitionsto intentionally use a different RI than the parent. > I see your point but I think we should provide resetting the option unless it is too complex or not feasible. > * Should publish_via_partition_root in publications affect this feature? IMO, no. A table can belong to multiple publications,and different publications may have different publish_via_partition_root settings. > I also don't think so. -- With Regards, Amit Kapila.
On Tue, Dec 16, 2025, at 7:12 AM, Amit Kapila wrote:
> On Mon, Dec 15, 2025 at 12:44 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>
>> From this perspective, when a new partition is created, it should automatically inherit the parent’s RI. With that
behavior,a “FORCE” option would rarely be needed, because having one partition use a different RI from its siblings
shouldbe an uncommon case.
>>
>> Based on this, I imagine the feature roughly like this:
>>
>> * When a new partition is created, it inherits its parent’s RI
>> * ALTER TABLE <table_name> REPLICA IDENTITY [ INHERIT | NO INHERIT ] -- error out on conflicts
>>
>
> Sounds reasonable to me.
>
+1.
>> This leads to a couple of follow-up questions:
>>
>> * Should RI be switchable between “inheritable” and “non-inheritable”, similar to constraints? IMO, no. RI is much
simplerthan constraints. For constraints, parent–child relationships exist between tables with potentially different
structures,so it is natural that child tables might have different constraints. RI, however, only applies to
partitionedtables, where partitions must share the same structure as the parent. In practice, it seems rare for
partitionsto intentionally use a different RI than the parent.
>>
>
> I see your point but I think we should provide resetting the option
> unless it is too complex or not feasible.
>
+1.
ALTER TABLE ...
REPLICA IDENTITY { DEFAULT [ INHERIT | NO INHERIT ] |
USING INDEX index_name |
FULL [ INHERIT | NO INHERIT ] |
NOTHING [ INHERIT | NO INHERIT ] }
It doesn't make sense to have the inheritance property for INDEX. Although the
natural default value is NO INHERIT to preserve backward compatibility, I
wondering if we use INHERIT as default. The main advantage is usability as Chao
Li already mentioned. Is there any cases that having a different replica
identity from parent/partitioned table makes sense?
--
Euler Taveira
EDB https://www.enterprisedb.com/
Вложения
Hi Amit and Euler,
On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira <euler@eulerto.com> wrote:
I wondering if we use INHERIT as default. The main advantage is usability as Chao
Li already mentioned. Is there any cases that having a different replica
identity from parent/partitioned table makes sense?
We can leave this topic open for discussion. In my current implementation, NO INHERIT is still the default. But if we decide to switch the default, I can add a new commit that should include only 1 line code change in gram.y and a tiny doc update.
0001 - when a new partition is created, use the parent's replication identity
0002 - add INHERIT | NO INHERIT
Best regards,
--Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Вложения
On Wednesday, December 17, 2025 3:56 PM Chao Li <li.evan.chao@gmail.com> wrote: > Thank you both for all your advice. Here comes my first implementation of > INHERIT in the attached v2 patch. > > On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira <mailto:euler@eulerto.com> wrote: > > > I wondering if we use INHERIT as default. The main advantage is usability as > > Chao Li already mentioned. Is there any cases that having a different > > replica identity from parent/partitioned table makes sense? > > We can leave this topic open for discussion. In my current implementation, NO > INHERIT is still the default. But if we decide to switch the default, I can add > a new commit that should include only 1 line code change in gram.y and a tiny > doc update. > > 0001 - when a new partition is created, use the parent's replication identity > 0002 - add INHERIT | NO INHERIT Thanks for updating the patches. I think there are several design considerations for this proposal: 1) Since the index names can vary across different partitions, what should be the expected behavior if a new partition cannot identify the same replica identity key as the root partitioned table? 2) Should we simply use the ONLY keyword to determine whether to propagate the replica identity to partitions instead of adding [NOT] INHERIT? For example, if a user specifies ONLY, it changes the identity of the parent table, and any newly created partitions will adopt this new identity. However, the identities of existing partitions remain unchanged. 3) There have been previous discussions on similar proposals[1][2]. It might be beneficial to review those debates to see whether any old issues or arguments are pertinent to this proposal. [1] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql [2] https://www.postgresql.org/message-id/flat/OSBPR01MB2982A2738F16722899A50082FECB0%40OSBPR01MB2982.jpnprd01.prod.outlook.com#2e5388a7cde3c10430f8668a6c381b06 Best Regards, Hou zj
> On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, December 17, 2025 3:56 PM Chao Li <li.evan.chao@gmail.com> wrote:
>> Thank you both for all your advice. Here comes my first implementation of
>> INHERIT in the attached v2 patch.
>>
>> On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira <mailto:euler@eulerto.com> wrote:
>>
>>> I wondering if we use INHERIT as default. The main advantage is usability as
>>> Chao Li already mentioned. Is there any cases that having a different
>>> replica identity from parent/partitioned table makes sense?
>>
>> We can leave this topic open for discussion. In my current implementation, NO
>> INHERIT is still the default. But if we decide to switch the default, I can add
>> a new commit that should include only 1 line code change in gram.y and a tiny
>> doc update.
>>
>> 0001 - when a new partition is created, use the parent's replication identity
>> 0002 - add INHERIT | NO INHERIT
>
Hi Zhijie,
Thanks for your feedback and linked information. I think this patch is avoiding the hard problem of “index” RI.
>
> I think there are several design considerations for this proposal:
>
> 1) Since the index names can vary across different partitions, what should be the
> expected behavior if a new partition cannot identify the same replica identity
> key as the root partitioned table?
Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and NONE.
>
> 2) Should we simply use the ONLY keyword to determine whether to propagate the
> replica identity to partitions instead of adding [NOT] INHERIT? For example, if
> a user specifies ONLY, it changes the identity of the parent table, and any
> newly created partitions will adopt this new identity. However, the identities
> of existing partitions remain unchanged.
The current syntax “ALTER TABLE [ONLY] table_name action”, I think here “ONLY” works in terms of inherited tables but
partitions.(I surprisedly find the doc doesn’t explain “ONLY” at all.) The current behavior (without my patch) proves
myunderstanding, “ALTER TABLE table_name REPLICA IDENTITY” only updates the parent table itself.
Given that it is not allowed to inherit a table from either a partitioned table (root/parent) or partition (leaf),
reusingthe “ONLY" at “ALTER TABLE” level won’t generate a conflict. But the problem is, we will have to revisit all
ALTERTABLE actions to see if they can propagate to partitions when the target table is a partitioned table. In that
case,scope of this patch is extremely extended.
The current approach adding INHERIT to the action “REPLICA IDENTIFY” has no global impacts. If you look at the patch,
thesyntax this patch uses is:
```
ALTER TABLE ...
REPLICA IDENTITY { DEFAULT [ INHERIT | NO INHERIT ] |
USING INDEX index_name |
FULL [ INHERIT | NO INHERIT ] |
NOTHING [ INHERIT | NO INHERIT ] }
```
It explicitly avoids the complexity of “USING INDEX”, thus we don’t have to mention any exceptions in docs.
Amit, what do you think?
>
> 3) There have been previous discussions on similar proposals[1][2]. It might be
> beneficial to review those debates to see whether any old issues or arguments
> are pertinent to this proposal.
>
> [1] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql
> [2]
https://www.postgresql.org/message-id/flat/OSBPR01MB2982A2738F16722899A50082FECB0%40OSBPR01MB2982.jpnprd01.prod.outlook.com#2e5388a7cde3c10430f8668a6c381b06
>
I read through the both threads. I think the key difference between the patch and the previous one is that this patch
onlyadds “INHERIT” to DEFAULT/FULL/NONE, which is narrow scoped. Let’s see how the folks who involved in the previous
discussionwill feedback to this patch.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
> On Dec 18, 2025, at 12:21, Chao Li <li.evan.chao@gmail.com> wrote:
>
>>
>> 2) Should we simply use the ONLY keyword to determine whether to propagate the
>> replica identity to partitions instead of adding [NOT] INHERIT? For example, if
>> a user specifies ONLY, it changes the identity of the parent table, and any
>> newly created partitions will adopt this new identity. However, the identities
>> of existing partitions remain unchanged.
>
> The current syntax “ALTER TABLE [ONLY] table_name action”, I think here “ONLY” works in terms of inherited tables but
partitions.(I surprisedly find the doc doesn’t explain “ONLY” at all.) The current behavior (without my patch) proves
myunderstanding, “ALTER TABLE table_name REPLICA IDENTITY” only updates the parent table itself.
>
> Given that it is not allowed to inherit a table from either a partitioned table (root/parent) or partition (leaf),
reusingthe “ONLY" at “ALTER TABLE” level won’t generate a conflict. But the problem is, we will have to revisit all
ALTERTABLE actions to see if they can propagate to partitions when the target table is a partitioned table. In that
case,scope of this patch is extremely extended.
>
> The current approach adding INHERIT to the action “REPLICA IDENTIFY” has no global impacts. If you look at the patch,
thesyntax this patch uses is:
> ```
> ALTER TABLE ...
> REPLICA IDENTITY { DEFAULT [ INHERIT | NO INHERIT ] |
> USING INDEX index_name |
> FULL [ INHERIT | NO INHERIT ] |
> NOTHING [ INHERIT | NO INHERIT ] }
> ```
> It explicitly avoids the complexity of “USING INDEX”, thus we don’t have to mention any exceptions in docs.
>
> Amit, what do you think?
I read through the doc of “ALTER TABLE” again, and got some findings.
```
name - The name (optionally schema-qualified) of an existing table to alter. If ONLY is specified before the table
name,only that table is altered. If ONLY is not specified, the table and all its descendant tables (if any) are
altered.Optionally, * can be specified after the table name to explicitly indicate that descendant tables are included.
```
For table name, it says "If ONLY is specified before the table name, only that table is altered. If ONLY is not
specified,the table and all its descendant tables (if any) are altered.” Here, I think we can “descendant tables” as
bothinherited table or partition table, as they are mutually exclusive.
And for “DISABLE/ENABLE [ REPLICA | ALWAYS ] TRIGGER”, the doc says:
```
DISABLE/ENABLE [ REPLICA | ALWAYS ] TRIGGER
When this command is applied to a partitioned table, the states of corresponding clone triggers in the partitions are
updatedtoo, unless ONLY is specified.
```
Here, ONLY is used for partition table. From this perspective, “REPLICA IDENTITY” not propagating to children feels
likea “bug”.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote: > > > On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > wrote: > > > > On Wednesday, December 17, 2025 3:56 PM Chao Li > <li.evan.chao@gmail.com> wrote: > >> Thank you both for all your advice. Here comes my first > >> implementation of INHERIT in the attached v2 patch. > >> > >> On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira > <mailto:euler@eulerto.com> wrote: > >> > >>> I wondering if we use INHERIT as default. The main advantage is > >>> usability as Chao Li already mentioned. Is there any cases that > >>> having a different replica identity from parent/partitioned table makes > sense? > >> > >> We can leave this topic open for discussion. In my current > >> implementation, NO INHERIT is still the default. But if we decide to > >> switch the default, I can add a new commit that should include only 1 > >> line code change in gram.y and a tiny doc update. > >> > >> 0001 - when a new partition is created, use the parent's replication > >> identity > >> 0002 - add INHERIT | NO INHERIT > > > > Hi Zhijie, > > Thanks for your feedback and linked information. I think this patch is avoiding > the hard problem of “index” RI. > > > > > I think there are several design considerations for this proposal: > > > > 1) Since the index names can vary across different partitions, what > > should be the expected behavior if a new partition cannot identify the > > same replica identity key as the root partitioned table? > > Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and > NONE. > I personally find skipping this part to be inconsistent, particularly given the existing disparities among ALTER TABLE commands related to partitioned table handling. Omitting this part introduces further inconsistency within the ALTER TABLE REPLICA IDENTITY. That said, I understand that skipping this implementation might be necessary due to technical challenges. It's important, however, to discuss, analyze and document the specific difficulties associated with implementing this part, and we should include this information in the code comments and commit message. Best Regards, Hou zj
On Fri, Dec 19, 2025 at 11:14 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote: > > > > > On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > > wrote: > > > > > > On Wednesday, December 17, 2025 3:56 PM Chao Li > > <li.evan.chao@gmail.com> wrote: > > >> Thank you both for all your advice. Here comes my first > > >> implementation of INHERIT in the attached v2 patch. > > >> > > >> On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira > > <mailto:euler@eulerto.com> wrote: > > >> > > >>> I wondering if we use INHERIT as default. The main advantage is > > >>> usability as Chao Li already mentioned. Is there any cases that > > >>> having a different replica identity from parent/partitioned table makes > > sense? > > >> > > >> We can leave this topic open for discussion. In my current > > >> implementation, NO INHERIT is still the default. But if we decide to > > >> switch the default, I can add a new commit that should include only 1 > > >> line code change in gram.y and a tiny doc update. > > >> > > >> 0001 - when a new partition is created, use the parent's replication > > >> identity > > >> 0002 - add INHERIT | NO INHERIT > > > > > > > Hi Zhijie, > > > > Thanks for your feedback and linked information. I think this patch is avoiding > > the hard problem of “index” RI. > > > > > > > > I think there are several design considerations for this proposal: > > > > > > 1) Since the index names can vary across different partitions, what > > > should be the expected behavior if a new partition cannot identify the > > > same replica identity key as the root partitioned table? > > > > Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and > > NONE. > > > > I personally find skipping this part to be inconsistent, particularly given the > existing disparities among ALTER TABLE commands related to partitioned table handling. > Omitting this part introduces further inconsistency within the ALTER TABLE > REPLICA IDENTITY. > Fair point. I think one should summarize the previous discussions with key differences and where the previous patch got stuck. Then, it would be good to get some feedback from the people involved previously. If there is an agreement that we can do INHERIT stuff for specific parts then fine, otherwise, I think we need to address the index part as well. -- With Regards, Amit Kapila.
> On Dec 22, 2025, at 14:26, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 19, 2025 at 11:14 AM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: >> >> On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote: >>> >>>> On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> >>> wrote: >>>> >>>> On Wednesday, December 17, 2025 3:56 PM Chao Li >>> <li.evan.chao@gmail.com> wrote: >>>>> Thank you both for all your advice. Here comes my first >>>>> implementation of INHERIT in the attached v2 patch. >>>>> >>>>> On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira >>> <mailto:euler@eulerto.com> wrote: >>>>> >>>>>> I wondering if we use INHERIT as default. The main advantage is >>>>>> usability as Chao Li already mentioned. Is there any cases that >>>>>> having a different replica identity from parent/partitioned table makes >>> sense? >>>>> >>>>> We can leave this topic open for discussion. In my current >>>>> implementation, NO INHERIT is still the default. But if we decide to >>>>> switch the default, I can add a new commit that should include only 1 >>>>> line code change in gram.y and a tiny doc update. >>>>> >>>>> 0001 - when a new partition is created, use the parent's replication >>>>> identity >>>>> 0002 - add INHERIT | NO INHERIT >>>> >>> >>> Hi Zhijie, >>> >>> Thanks for your feedback and linked information. I think this patch is avoiding >>> the hard problem of “index” RI. >>> >>>> >>>> I think there are several design considerations for this proposal: >>>> >>>> 1) Since the index names can vary across different partitions, what >>>> should be the expected behavior if a new partition cannot identify the >>>> same replica identity key as the root partitioned table? >>> >>> Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and >>> NONE. >>> >> >> I personally find skipping this part to be inconsistent, particularly given the >> existing disparities among ALTER TABLE commands related to partitioned table handling. >> Omitting this part introduces further inconsistency within the ALTER TABLE >> REPLICA IDENTITY. >> > > Fair point. I think one should summarize the previous discussions with > key differences and where the previous patch got stuck. Then, it would > be good to get some feedback from the people involved previously. If > there is an agreement that we can do INHERIT stuff for specific parts > then fine, otherwise, I think we need to address the index part as > well. > [1] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql > [2] https://www.postgresql.org/message-id/flat/OSBPR01MB2982A2738F16722899A50082FECB0%40OSBPR01MB2982.jpnprd01.prod.outlook.com#2e5388a7cde3c10430f8668a6c381b06 Sure, let me try to summarize the two discussions. For [1], key participants included Álvaro Herrera (patch author), Dmitry Dolgov (reviewer/tester), Robert Haas, Simon Riggs,Michael Paquier, and Peter Eisentraut. ---- Brief summary with rough timeline: * In 2019, Álvaro proposed a patch that essentially did the same thing as this patch. That patch also attempted to handle“index-based” replica identity, which my patch intentionally avoids. * Dmitry tested the patch and reported issues related to attaching partitions, including index-related errors. * Robert then pointed out that having REPLICA IDENTITY recurse was inconsistent with the behavior of other ALTER TABLE actionssuch as TABLESPACE, and emphasized that we should really try to think through *all* actions that might recurse tochild partitions. * Álvaro noted the use of `ALTER TABLE ONLY`, suggesting that WITHOUT ONLY could recurse to children, while ONLY would affectjust the parent. * Simon commented that recursing TABLESPACE changes could be problematic because they imply physical data rewrites. * Robert listed several ALTER TABLE actions that lacked consistent recurse behavior (identity columns, triggers, CLUSTER,OWNER, TABLESPACE, CHECK constraints). * This led to broader discussion about whether TABLESPACE/OWNER/etc. should recurse. * Michael echoed support for having REPLICA IDENTITY recurse, controlled via ONLY. * Peter pointed out that recursing ADD GENERATED AS IDENTITY / DROP IDENTITY may not be feasible. * Álvaro wanted to proceed with the patch. * Robert maintained that defining consistent semantics for all relevant ALTER TABLE actions should come first. Overall, the blocker was an unresolved semantic disagreement, rather than a concrete technical objection to the patch itself.There appeared to be broad agreement that: * New partitions should use the parent’s replica identity. * ONLY could be used to control whether replica identity changes recurse. However, the discussion about “semantic consistency” significantly broadened the scope, and there was no clear agreementon whether TABLESPACE/OWNER/etc. should recurse, which ultimately stalled the effort. For [2], key participants included: Takayuki Tsunakawa (patch author), Bharath Rupireddy (reviewer), Álvaro Herrera, MichaelPaquier, and Kyotaro Horiguchi. ---- Brief summary with rough timeline: * In 2020, Takayuki proposed a patch intended to propagate ALTER TABLE SET LOGGED / UNLOGGED to partitions. * Bharath reviewed the patch and raised a number of questions and edge cases. * There were initial discussions about the patch mechanics and expected behavior. * Álvaro then pointed out the strong relation to the earlier discussion in [1]. * The focus of the discussion shifted to the more fundamental question of “what is the parent?”: * Takayuki viewed ALTER TABLE on a partitioned table as a convenient batch operation on existing partitions, with futurepartitions remaining independent. * Álvaro, Michael, and Kyotaro argued that changing a property on the parent should define the default for future partitionsas well. * No clear agreement was reached on this semantic question, and the discussion expanded into broader concerns about consistencyacross ALTER TABLE actions. * Takayuki withdrew the patch Overall, [2] also fell under the umbrella of “semantic consistency”, the main discussion was not about replica identity itself. Current situation: ---- * Resolving “semantic consistency” across ALTER TABLE actions in a single release appears to be the biggest challenge. However,addressing everything in one patch set does not seem realistic. * The core question of “what is the parent?” from [1] remains central. That said, the discussion appeared to lean towardthe view that future partitions should inherit properties from the parent. * Different properties behave very differently. For example, propagating REPLICA IDENTITY is a metadata-only change and relativelysafe, while propagating TABLESPACE implies physical data movement and is much riskier. As a result, each ALTERTABLE action may deserve its own discussion and patch set. * The inconsistency is not limited to ALTER TABLE but also exists in CREATE TABLE behavior. For example, a new partitioninherits TABLESPACE from the parent, but not REPLICA IDENTITY. * “ALTER TABLE ONLY table_name” is commonly suggested as the mechanism to control whether changes should recurse to partitions. How to proceed? ---- If we stop here, these inconsistencies will remain indefinitely, which I believe nobody really wants. With that in mind,I’d like to suggest a two-phase approach. Phase 1: Document current behavior and set expectations * Identify all ALTER TABLE actions involved in these inconsistencies. * Update the ALTER TABLE and CREATE TABLE documentation to clearly describe the current behavior for partitioned tables,and (where appropriate) the intended or ideal behavior. * Explicitly document the meaning of ONLY for partitioned tables, and note that some actions may behave differently, withdetails described in each action’s section. Phase 2: Address actions incrementally * Work on each ALTER TABLE action individually, recognizing that some may be straightforward while others require more designdiscussion, coding, and testing. * With Phase 1 in place to set expectations, it may not be necessary to complete all actions in a single release. * That said, it would still be desirable to keep the work bounded, for example within one or two major releases, to avoidlong-term fragmentation. I’ve included the participants from the previous discussions on CC, in case they want to comment further. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
> On Dec 29, 2025, at 16:24, Chao Li <li.evan.chao@gmail.com> wrote: > > > >> On Dec 22, 2025, at 14:26, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Fri, Dec 19, 2025 at 11:14 AM Zhijie Hou (Fujitsu) >> <houzj.fnst@fujitsu.com> wrote: >>> >>> On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote: >>>> >>>>> On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> >>>> wrote: >>>>> >>>>> On Wednesday, December 17, 2025 3:56 PM Chao Li >>>> <li.evan.chao@gmail.com> wrote: >>>>>> Thank you both for all your advice. Here comes my first >>>>>> implementation of INHERIT in the attached v2 patch. >>>>>> >>>>>> On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira >>>> <mailto:euler@eulerto.com> wrote: >>>>>> >>>>>>> I wondering if we use INHERIT as default. The main advantage is >>>>>>> usability as Chao Li already mentioned. Is there any cases that >>>>>>> having a different replica identity from parent/partitioned table makes >>>> sense? >>>>>> >>>>>> We can leave this topic open for discussion. In my current >>>>>> implementation, NO INHERIT is still the default. But if we decide to >>>>>> switch the default, I can add a new commit that should include only 1 >>>>>> line code change in gram.y and a tiny doc update. >>>>>> >>>>>> 0001 - when a new partition is created, use the parent's replication >>>>>> identity >>>>>> 0002 - add INHERIT | NO INHERIT >>>>> >>>> >>>> Hi Zhijie, >>>> >>>> Thanks for your feedback and linked information. I think this patch is avoiding >>>> the hard problem of “index” RI. >>>> >>>>> >>>>> I think there are several design considerations for this proposal: >>>>> >>>>> 1) Since the index names can vary across different partitions, what >>>>> should be the expected behavior if a new partition cannot identify the >>>>> same replica identity key as the root partitioned table? >>>> >>>> Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and >>>> NONE. >>>> >>> >>> I personally find skipping this part to be inconsistent, particularly given the >>> existing disparities among ALTER TABLE commands related to partitioned table handling. >>> Omitting this part introduces further inconsistency within the ALTER TABLE >>> REPLICA IDENTITY. >>> >> >> Fair point. I think one should summarize the previous discussions with >> key differences and where the previous patch got stuck. Then, it would >> be good to get some feedback from the people involved previously. If >> there is an agreement that we can do INHERIT stuff for specific parts >> then fine, otherwise, I think we need to address the index part as >> well. > >> [1] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql >> [2] https://www.postgresql.org/message-id/flat/OSBPR01MB2982A2738F16722899A50082FECB0%40OSBPR01MB2982.jpnprd01.prod.outlook.com#2e5388a7cde3c10430f8668a6c381b06 > > Sure, let me try to summarize the two discussions. > > For [1], key participants included Álvaro Herrera (patch author), Dmitry Dolgov (reviewer/tester), Robert Haas, Simon Riggs,Michael Paquier, and Peter Eisentraut. > ---- > > Brief summary with rough timeline: > > * In 2019, Álvaro proposed a patch that essentially did the same thing as this patch. That patch also attempted to handle“index-based” replica identity, which my patch intentionally avoids. > * Dmitry tested the patch and reported issues related to attaching partitions, including index-related errors. > * Robert then pointed out that having REPLICA IDENTITY recurse was inconsistent with the behavior of other ALTER TABLEactions such as TABLESPACE, and emphasized that we should really try to think through *all* actions that might recurseto child partitions. > * Álvaro noted the use of `ALTER TABLE ONLY`, suggesting that WITHOUT ONLY could recurse to children, while ONLY wouldaffect just the parent. > * Simon commented that recursing TABLESPACE changes could be problematic because they imply physical data rewrites. > * Robert listed several ALTER TABLE actions that lacked consistent recurse behavior (identity columns, triggers, CLUSTER,OWNER, TABLESPACE, CHECK constraints). > * This led to broader discussion about whether TABLESPACE/OWNER/etc. should recurse. > * Michael echoed support for having REPLICA IDENTITY recurse, controlled via ONLY. > * Peter pointed out that recursing ADD GENERATED AS IDENTITY / DROP IDENTITY may not be feasible. > * Álvaro wanted to proceed with the patch. > * Robert maintained that defining consistent semantics for all relevant ALTER TABLE actions should come first. > > Overall, the blocker was an unresolved semantic disagreement, rather than a concrete technical objection to the patch itself.There appeared to be broad agreement that: > * New partitions should use the parent’s replica identity. > * ONLY could be used to control whether replica identity changes recurse. > > However, the discussion about “semantic consistency” significantly broadened the scope, and there was no clear agreementon whether TABLESPACE/OWNER/etc. should recurse, which ultimately stalled the effort. > > For [2], key participants included: Takayuki Tsunakawa (patch author), Bharath Rupireddy (reviewer), Álvaro Herrera, MichaelPaquier, and Kyotaro Horiguchi. > ---- > > Brief summary with rough timeline: > > * In 2020, Takayuki proposed a patch intended to propagate ALTER TABLE SET LOGGED / UNLOGGED to partitions. > * Bharath reviewed the patch and raised a number of questions and edge cases. > * There were initial discussions about the patch mechanics and expected behavior. > * Álvaro then pointed out the strong relation to the earlier discussion in [1]. > * The focus of the discussion shifted to the more fundamental question of “what is the parent?”: > * Takayuki viewed ALTER TABLE on a partitioned table as a convenient batch operation on existing partitions, with futurepartitions remaining independent. > * Álvaro, Michael, and Kyotaro argued that changing a property on the parent should define the default for future partitionsas well. > * No clear agreement was reached on this semantic question, and the discussion expanded into broader concerns about consistencyacross ALTER TABLE actions. > * Takayuki withdrew the patch > > Overall, [2] also fell under the umbrella of “semantic consistency”, the main discussion was not about replica identityitself. > > Current situation: > ---- > > * Resolving “semantic consistency” across ALTER TABLE actions in a single release appears to be the biggest challenge.However, addressing everything in one patch set does not seem realistic. > * The core question of “what is the parent?” from [1] remains central. That said, the discussion appeared to lean towardthe view that future partitions should inherit properties from the parent. > * Different properties behave very differently. For example, propagating REPLICA IDENTITY is a metadata-only change andrelatively safe, while propagating TABLESPACE implies physical data movement and is much riskier. As a result, each ALTERTABLE action may deserve its own discussion and patch set. > * The inconsistency is not limited to ALTER TABLE but also exists in CREATE TABLE behavior. For example, a new partitioninherits TABLESPACE from the parent, but not REPLICA IDENTITY. > * “ALTER TABLE ONLY table_name” is commonly suggested as the mechanism to control whether changes should recurse to partitions. > > How to proceed? > ---- > > If we stop here, these inconsistencies will remain indefinitely, which I believe nobody really wants. With that in mind,I’d like to suggest a two-phase approach. > > Phase 1: Document current behavior and set expectations > > * Identify all ALTER TABLE actions involved in these inconsistencies. > * Update the ALTER TABLE and CREATE TABLE documentation to clearly describe the current behavior for partitioned tables,and (where appropriate) the intended or ideal behavior. > * Explicitly document the meaning of ONLY for partitioned tables, and note that some actions may behave differently, withdetails described in each action’s section. > > Phase 2: Address actions incrementally > > * Work on each ALTER TABLE action individually, recognizing that some may be straightforward while others require moredesign discussion, coding, and testing. > * With Phase 1 in place to set expectations, it may not be necessary to complete all actions in a single release. > * That said, it would still be desirable to keep the work bounded, for example within one or two major releases, to avoidlong-term fragmentation. > > I’ve included the participants from the previous discussions on CC, in case they want to comment further. > To move this patch forward, I’ve been working on a few related patches in parallel: • [1] Enhances the “ALTER TABLE” documentation to clarify how subcommands behave on partitioned tables. In that discussion,I summarized the current behaviors of all subcommands into 15 categories. • [2] Adds a NOTICE when ONLY is not specified but the action does not recurse to child partitions. • [3] Improves the header comments in tablecmds.c to better explain the meaning of “recurse” and “recursing”. While working on [1], [2] and [3], I gained a deeper understanding of ALTER TABLE behavior. Based on that, v3 is a completerework compared to v2. The behavior implemented by v3 is: • Recursion is performed during the preparation phase. • When ONLY is specified, no recursion occurs; without ONLY, the command recurses to all child partitions. • All replica identity types are supported, including USING INDEX (which was not supported in v2). • For non-index replica identities, the existing ATSimpleRecursion() infrastructure is reused; for index-based replicaidentity, the corresponding index name must be determined for each child partition. In a previous discussion [4], Dmitry Dolgov pointed out a test case that resulted in a DEADLOCK. I ran that test againstv3. The test still fails, but I no longer observe a deadlock; instead, the server now crashes during partition attachment.I will investigate this further. While working on v3, I also noticed another issue that may be a bug. When creating an index on a partitioned table, the indexis automatically created on all partitions. However, if a column type change causes an index rebuild, it appears thatthe rebuilt index may be missing from child partitions. I will verify this behavior and, if confirmed, discuss it ina separate thread. At this point, I think v3 is in a reviewable state. [1] https://postgr.es/m/CAEoWx2%3DmYhCfsnHaN96Qqwq5b0GVS2YgO3zpVqPPRd_iO52wRw%40mail.gmail.com [2] https://postgr.es/m/CAEoWx2=SLga-xH09Cq_PAvsHhQHrBK+V0vF821JKgzS=Bm0haA@mail.gmail.com [3] https://postgr.es/m/CAEoWx2n9E6_zxPbqwMpaPuC1C_p4b3y635SjiDuCPSVm8GBjsA@mail.gmail.com [4] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Вложения
> On Jan 26, 2026, at 10:51, Chao Li <li.evan.chao@gmail.com> wrote: > > > While working on v3, I also noticed another issue that may be a bug. When creating an index on a partitioned table, theindex is automatically created on all partitions. However, if a column type change causes an index rebuild, it appearsthat the rebuilt index may be missing from child partitions. I will verify this behavior and, if confirmed, discussit in a separate thread. > I have verified the problem. The real problem is, after index rebuild, indisreplident flag of index on child partition willbe lost, which should be a bug. I will start a new thread to give more details of the problem and propose a fix. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On Mon, Jan 26, 2026 at 04:56:36PM +0800, Chao Li wrote: > I have verified the problem. The real problem is, after index > rebuild, indisreplident flag of index on child partition will be > lost, which should be a bug. I will start a new thread to give more > details of the problem and propose a fix. That sounds like a real problem, yes. Thanks for starting a new thread about that. -- Michael
Вложения
> On Jan 27, 2026, at 14:56, Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Jan 26, 2026 at 04:56:36PM +0800, Chao Li wrote: >> I have verified the problem. The real problem is, after index >> rebuild, indisreplident flag of index on child partition will be >> lost, which should be a bug. I will start a new thread to give more >> details of the problem and propose a fix. > > That sounds like a real problem, yes. Thanks for starting a new > thread about that. > -- > Michael This is the new discussion for the bug: https://www.postgresql.org/message-id/DB533C25-C6BA-4C0F-8046-96168E9CDD72%40gmail.com Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
> On Jan 26, 2026, at 10:51, Chao Li <li.evan.chao@gmail.com> wrote:
>
> In a previous discussion [4], Dmitry Dolgov pointed out a test case that resulted in a DEADLOCK. I ran that test
againstv3. The test still fails, but I no longer observe a deadlock; instead, the server now crashes during partition
attachment.I will investigate this further.
I tried to investigate the server crash yesterday, but I’m no longer able to reproduce it. From the record of the first
crashI encountered, the call stack looked
like this:
```
TRAP: failed Assert("entry->data.lockmode == BUFFER_LOCK_UNLOCK"), File: "bufmgr.c", Line: 5908, PID: 47991
0 postgres 0x00000001013d9bb0 ExceptionalCondition + 216
1 postgres 0x0000000101129a80 BufferLockConditional + 88
2 postgres 0x0000000101129a04 ConditionalLockBuffer + 224
3 postgres 0x0000000100b8966c _bt_conditionallockbuf + 28
4 postgres 0x0000000100b88714 _bt_allocbuf + 128
5 postgres 0x0000000100b858d4 _bt_split + 1496
6 postgres 0x0000000100b82cec _bt_insertonpg + 1520
7 postgres 0x0000000100b81220 _bt_doinsert + 608
8 postgres 0x0000000100b9a008 btinsert + 120
9 postgres 0x0000000100b7a224 index_insert + 552
10 postgres 0x0000000100c5dd50 CatalogIndexInsert + 764
11 postgres 0x0000000100c5df60 CatalogTupleUpdate + 100
12 postgres 0x0000000100c7e608 ConstraintSetParentConstraint + 580
13 postgres 0x0000000100de6598 AttachPartitionEnsureIndexes + 1596
14 postgres 0x0000000100de5cac attachPartitionTable + 80
15 postgres 0x0000000100dd864c ATExecAttachPartition + 2520
16 postgres 0x0000000100dcb8e8 ATExecCmd + 4464
17 postgres 0x0000000100dc6054 ATRewriteCatalogs + 408
18 postgres 0x0000000100dbfa18 ATController + 256
19 postgres 0x0000000100dbf84c AlterTable + 96
20 postgres 0x00000001011a3508 ProcessUtilitySlow + 1704
21 postgres 0x00000001011a111c standard_ProcessUtility + 3504
22 postgres 0x00000001011a035c ProcessUtility + 360
23 postgres 0x000000010119fa10 PortalRunUtility + 216
24 postgres 0x000000010119eae0 PortalRunMulti + 688
25 postgres 0x000000010119e018 PortalRun + 788
26 postgres 0x0000000101198dcc exec_simple_query + 1380
27 postgres 0x0000000101197ee8 PostgresMain + 3244
28 postgres 0x000000010118f8d0 BackendInitialize + 0
29 postgres 0x0000000101061f3c postmaster_child_launch + 456
30 postgres 0x00000001010696c8 BackendStartup + 304
31 postgres 0x0000000101067564 ServerLoop + 372
32 postgres 0x0000000101066044 PostmasterMain + 6440
33 postgres 0x0000000100ee40a4 main + 924
34 dyld 0x000000019a36dd54 start + 7184
2026-01-26 09:52:41.240 CST [46845] LOG: client backend (PID 47991) was terminated by signal 6: Abort trap: 6
```
I noticed that the Assert in bufmgr.c was removed earlier today by commit 333f58637.
However, with the server crash no longer occurring, the DEADLOCK issue reappeared. After some investigation, I
confirmedthat the deadlock is not specific to this patch, I can consistently reproduce it with ATTACH PARTITION on the
masterbranch. That suggests this is a more general problem.
I’ll start a new thread to follow up on the deadlock separately.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
> On Jan 26, 2026, at 10:51, Chao Li <li.evan.chao@gmail.com> wrote: > > > >> On Dec 29, 2025, at 16:24, Chao Li <li.evan.chao@gmail.com> wrote: >> >> >> >>> On Dec 22, 2025, at 14:26, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> >>> On Fri, Dec 19, 2025 at 11:14 AM Zhijie Hou (Fujitsu) >>> <houzj.fnst@fujitsu.com> wrote: >>>> >>>> On Thursday, December 18, 2025 12:21 PM Chao Li <li.evan.chao@gmail.com> wrote: >>>>> >>>>>> On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> >>>>> wrote: >>>>>> >>>>>> On Wednesday, December 17, 2025 3:56 PM Chao Li >>>>> <li.evan.chao@gmail.com> wrote: >>>>>>> Thank you both for all your advice. Here comes my first >>>>>>> implementation of INHERIT in the attached v2 patch. >>>>>>> >>>>>>> On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira >>>>> <mailto:euler@eulerto.com> wrote: >>>>>>> >>>>>>>> I wondering if we use INHERIT as default. The main advantage is >>>>>>>> usability as Chao Li already mentioned. Is there any cases that >>>>>>>> having a different replica identity from parent/partitioned table makes >>>>> sense? >>>>>>> >>>>>>> We can leave this topic open for discussion. In my current >>>>>>> implementation, NO INHERIT is still the default. But if we decide to >>>>>>> switch the default, I can add a new commit that should include only 1 >>>>>>> line code change in gram.y and a tiny doc update. >>>>>>> >>>>>>> 0001 - when a new partition is created, use the parent's replication >>>>>>> identity >>>>>>> 0002 - add INHERIT | NO INHERIT >>>>>> >>>>> >>>>> Hi Zhijie, >>>>> >>>>> Thanks for your feedback and linked information. I think this patch is avoiding >>>>> the hard problem of “index” RI. >>>>> >>>>>> >>>>>> I think there are several design considerations for this proposal: >>>>>> >>>>>> 1) Since the index names can vary across different partitions, what >>>>>> should be the expected behavior if a new partition cannot identify the >>>>>> same replica identity key as the root partitioned table? >>>>> >>>>> Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and >>>>> NONE. >>>>> >>>> >>>> I personally find skipping this part to be inconsistent, particularly given the >>>> existing disparities among ALTER TABLE commands related to partitioned table handling. >>>> Omitting this part introduces further inconsistency within the ALTER TABLE >>>> REPLICA IDENTITY. >>>> >>> >>> Fair point. I think one should summarize the previous discussions with >>> key differences and where the previous patch got stuck. Then, it would >>> be good to get some feedback from the people involved previously. If >>> there is an agreement that we can do INHERIT stuff for specific parts >>> then fine, otherwise, I think we need to address the index part as >>> well. >> >>> [1] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql >>> [2] https://www.postgresql.org/message-id/flat/OSBPR01MB2982A2738F16722899A50082FECB0%40OSBPR01MB2982.jpnprd01.prod.outlook.com#2e5388a7cde3c10430f8668a6c381b06 >> >> Sure, let me try to summarize the two discussions. >> >> For [1], key participants included Álvaro Herrera (patch author), Dmitry Dolgov (reviewer/tester), Robert Haas, SimonRiggs, Michael Paquier, and Peter Eisentraut. >> ---- >> >> Brief summary with rough timeline: >> >> * In 2019, Álvaro proposed a patch that essentially did the same thing as this patch. That patch also attempted to handle“index-based” replica identity, which my patch intentionally avoids. >> * Dmitry tested the patch and reported issues related to attaching partitions, including index-related errors. >> * Robert then pointed out that having REPLICA IDENTITY recurse was inconsistent with the behavior of other ALTER TABLEactions such as TABLESPACE, and emphasized that we should really try to think through *all* actions that might recurseto child partitions. >> * Álvaro noted the use of `ALTER TABLE ONLY`, suggesting that WITHOUT ONLY could recurse to children, while ONLY wouldaffect just the parent. >> * Simon commented that recursing TABLESPACE changes could be problematic because they imply physical data rewrites. >> * Robert listed several ALTER TABLE actions that lacked consistent recurse behavior (identity columns, triggers, CLUSTER,OWNER, TABLESPACE, CHECK constraints). >> * This led to broader discussion about whether TABLESPACE/OWNER/etc. should recurse. >> * Michael echoed support for having REPLICA IDENTITY recurse, controlled via ONLY. >> * Peter pointed out that recursing ADD GENERATED AS IDENTITY / DROP IDENTITY may not be feasible. >> * Álvaro wanted to proceed with the patch. >> * Robert maintained that defining consistent semantics for all relevant ALTER TABLE actions should come first. >> >> Overall, the blocker was an unresolved semantic disagreement, rather than a concrete technical objection to the patchitself. There appeared to be broad agreement that: >> * New partitions should use the parent’s replica identity. >> * ONLY could be used to control whether replica identity changes recurse. >> >> However, the discussion about “semantic consistency” significantly broadened the scope, and there was no clear agreementon whether TABLESPACE/OWNER/etc. should recurse, which ultimately stalled the effort. >> >> For [2], key participants included: Takayuki Tsunakawa (patch author), Bharath Rupireddy (reviewer), Álvaro Herrera, MichaelPaquier, and Kyotaro Horiguchi. >> ---- >> >> Brief summary with rough timeline: >> >> * In 2020, Takayuki proposed a patch intended to propagate ALTER TABLE SET LOGGED / UNLOGGED to partitions. >> * Bharath reviewed the patch and raised a number of questions and edge cases. >> * There were initial discussions about the patch mechanics and expected behavior. >> * Álvaro then pointed out the strong relation to the earlier discussion in [1]. >> * The focus of the discussion shifted to the more fundamental question of “what is the parent?”: >> * Takayuki viewed ALTER TABLE on a partitioned table as a convenient batch operation on existing partitions, with futurepartitions remaining independent. >> * Álvaro, Michael, and Kyotaro argued that changing a property on the parent should define the default for future partitionsas well. >> * No clear agreement was reached on this semantic question, and the discussion expanded into broader concerns about consistencyacross ALTER TABLE actions. >> * Takayuki withdrew the patch >> >> Overall, [2] also fell under the umbrella of “semantic consistency”, the main discussion was not about replica identityitself. >> >> Current situation: >> ---- >> >> * Resolving “semantic consistency” across ALTER TABLE actions in a single release appears to be the biggest challenge.However, addressing everything in one patch set does not seem realistic. >> * The core question of “what is the parent?” from [1] remains central. That said, the discussion appeared to lean towardthe view that future partitions should inherit properties from the parent. >> * Different properties behave very differently. For example, propagating REPLICA IDENTITY is a metadata-only change andrelatively safe, while propagating TABLESPACE implies physical data movement and is much riskier. As a result, each ALTERTABLE action may deserve its own discussion and patch set. >> * The inconsistency is not limited to ALTER TABLE but also exists in CREATE TABLE behavior. For example, a new partitioninherits TABLESPACE from the parent, but not REPLICA IDENTITY. >> * “ALTER TABLE ONLY table_name” is commonly suggested as the mechanism to control whether changes should recurse to partitions. >> >> How to proceed? >> ---- >> >> If we stop here, these inconsistencies will remain indefinitely, which I believe nobody really wants. With that in mind,I’d like to suggest a two-phase approach. >> >> Phase 1: Document current behavior and set expectations >> >> * Identify all ALTER TABLE actions involved in these inconsistencies. >> * Update the ALTER TABLE and CREATE TABLE documentation to clearly describe the current behavior for partitioned tables,and (where appropriate) the intended or ideal behavior. >> * Explicitly document the meaning of ONLY for partitioned tables, and note that some actions may behave differently, withdetails described in each action’s section. >> >> Phase 2: Address actions incrementally >> >> * Work on each ALTER TABLE action individually, recognizing that some may be straightforward while others require moredesign discussion, coding, and testing. >> * With Phase 1 in place to set expectations, it may not be necessary to complete all actions in a single release. >> * That said, it would still be desirable to keep the work bounded, for example within one or two major releases, to avoidlong-term fragmentation. >> >> I’ve included the participants from the previous discussions on CC, in case they want to comment further. >> > > To move this patch forward, I’ve been working on a few related patches in parallel: > > • [1] Enhances the “ALTER TABLE” documentation to clarify how subcommands behave on partitioned tables. In that discussion,I summarized the current behaviors of all subcommands into 15 categories. > • [2] Adds a NOTICE when ONLY is not specified but the action does not recurse to child partitions. > • [3] Improves the header comments in tablecmds.c to better explain the meaning of “recurse” and “recursing”. > > While working on [1], [2] and [3], I gained a deeper understanding of ALTER TABLE behavior. Based on that, v3 is a completerework compared to v2. The behavior implemented by v3 is: > > • Recursion is performed during the preparation phase. > • When ONLY is specified, no recursion occurs; without ONLY, the command recurses to all child partitions. > • All replica identity types are supported, including USING INDEX (which was not supported in v2). > • For non-index replica identities, the existing ATSimpleRecursion() infrastructure is reused; for index-based replicaidentity, the corresponding index name must be determined for each child partition. > > In a previous discussion [4], Dmitry Dolgov pointed out a test case that resulted in a DEADLOCK. I ran that test againstv3. The test still fails, but I no longer observe a deadlock; instead, the server now crashes during partition attachment.I will investigate this further. > > While working on v3, I also noticed another issue that may be a bug. When creating an index on a partitioned table, theindex is automatically created on all partitions. However, if a column type change causes an index rebuild, it appearsthat the rebuilt index may be missing from child partitions. I will verify this behavior and, if confirmed, discussit in a separate thread. > > At this point, I think v3 is in a reviewable state. > > [1] https://postgr.es/m/CAEoWx2%3DmYhCfsnHaN96Qqwq5b0GVS2YgO3zpVqPPRd_iO52wRw%40mail.gmail.com > [2] https://postgr.es/m/CAEoWx2=SLga-xH09Cq_PAvsHhQHrBK+V0vF821JKgzS=Bm0haA@mail.gmail.com > [3] https://postgr.es/m/CAEoWx2n9E6_zxPbqwMpaPuC1C_p4b3y635SjiDuCPSVm8GBjsA@mail.gmail.com > [4] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql > > Best regards, > -- > Chao Li (Evan) > HighGo Software Co., Ltd. > https://www.highgo.com/ > > > > > <v3-0001-Inherit-replica-identity-for-new-and-attached-par.patch><v3-0002-ALTER-TABLE-recurse-REPLICA-IDENTITY-to-partition.patch> Trying to move forward this patch, as the previous blockers have been addressed: * ALTER TABLE “inconsistency” is clarified by the other patch, see [1]. * DEADLOCK has been addressed by the other patch, see [2]. * “index” replica identity is handled. PFA v4: rebased and deleted an unneeded test case. [1] https://postgr.es/m/CAEoWx2mUiCYJEBuo5D74gi7pHfNz82b54oDHjZNtrXRFDnBPOg@mail.gmail.com [2] https://postgr.es/m/CFACA0EB-7E6F-4FAA-9ACE-1FC2226D7482@gmail.com Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/