Обсуждение: Avoid retaining conflict-related data when no tables are subscribed
Hi, My colleague Nisha reported an issue to me off-list: dead tuples can't be removed when retain_dead_tuples is enabled for a subscription with no tables. This appears to stem from the inability to advance the non-removable transaction ID when AllTablesyncsReady() returns false. Since this function returns false when no tables are present, which leads to unnecessary data retention until a table is added to the subscription. Since dead tuples don't need to be retained when no tables are subscribed, here is a patch to fix it, modifying AllTablesyncsReady() to allows no tables to be treated as a ready state when explicitly requested. Best Regards, Hou zj
Вложения
On Thu, Aug 28, 2025 at 7:54 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > Hi, > > My colleague Nisha reported an issue to me off-list: dead tuples can't > be removed when retain_dead_tuples is enabled for a subscription with no tables. > > This appears to stem from the inability to advance the non-removable transaction > ID when AllTablesyncsReady() returns false. Since this function returns false > when no tables are present, which leads to unnecessary data retention until a > table is added to the subscription. > > Since dead tuples don't need to be retained when no tables are subscribed, here > is a patch to fix it, modifying AllTablesyncsReady() to allows no tables to be > treated as a ready state when explicitly requested. > The patch LGTM. Verified it, fixes the issue. thanks Shveta
On Thu, Aug 28, 2025 at 7:54 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > Hi, > > My colleague Nisha reported an issue to me off-list: dead tuples can't > be removed when retain_dead_tuples is enabled for a subscription with no tables. > > This appears to stem from the inability to advance the non-removable transaction > ID when AllTablesyncsReady() returns false. Since this function returns false > when no tables are present, which leads to unnecessary data retention until a > table is added to the subscription. > > Since dead tuples don't need to be retained when no tables are subscribed, here > is a patch to fix it, modifying AllTablesyncsReady() to allows no tables to be > treated as a ready state when explicitly requested. + /* If there are no tables, decide readiness based on the parameter */ + if (!has_subrels) + return ready_if_no_tables; + /* * Return false when there are no tables in subscription or not all tables * are in ready state; true otherwise. */ - return has_subrels && (table_states_not_ready == NIL); + return table_states_not_ready == NIL; The first part of the comment "Return false when there are no tables in subscription" is outdated now with your fix. Otherwise it looks fine. -- Regards, Dilip Kumar Google
Just a typo: + * it seem feasible to skip all phases and directly assign Should be "it seems". Regards, Steven 在 2025/8/28 10:23, Zhijie Hou (Fujitsu) 写道: > Hi, > > My colleague Nisha reported an issue to me off-list: dead tuples can't > be removed when retain_dead_tuples is enabled for a subscription with no tables. > > This appears to stem from the inability to advance the non-removable transaction > ID when AllTablesyncsReady() returns false. Since this function returns false > when no tables are present, which leads to unnecessary data retention until a > table is added to the subscription. > > Since dead tuples don't need to be retained when no tables are subscribed, here > is a patch to fix it, modifying AllTablesyncsReady() to allows no tables to be > treated as a ready state when explicitly requested. > > Best Regards, > Hou zj
On Thu, Aug 28, 2025 at 7:54 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > Hi, > > My colleague Nisha reported an issue to me off-list: dead tuples can't > be removed when retain_dead_tuples is enabled for a subscription with no tables. > > This appears to stem from the inability to advance the non-removable transaction > ID when AllTablesyncsReady() returns false. Since this function returns false > when no tables are present, which leads to unnecessary data retention until a > table is added to the subscription. > > Since dead tuples don't need to be retained when no tables are subscribed, here > is a patch to fix it, modifying AllTablesyncsReady() to allows no tables to be > treated as a ready state when explicitly requested. > Thank you Hou-san for the patch. I've verified the fix and tested the changes. The patch LGTM. -- Thanks, Nisha
On Thu, Aug 28, 2025 at 7:54 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > Hi, > > My colleague Nisha reported an issue to me off-list: dead tuples can't > be removed when retain_dead_tuples is enabled for a subscription with no tables. > > This appears to stem from the inability to advance the non-removable transaction > ID when AllTablesyncsReady() returns false. Since this function returns false > when no tables are present, which leads to unnecessary data retention until a > table is added to the subscription. > > Since dead tuples don't need to be retained when no tables are subscribed, here > is a patch to fix it, modifying AllTablesyncsReady() to allows no tables to be > treated as a ready state when explicitly requested. > Few comments: ============ Aren't following two paragraphs in comments contradict each other: * It is safe to add new tables with initial states to the subscription * after this check because any changes applied to these tables should * have a WAL position greater than the rdt_data->remote_lsn. + * + * Advancing the transaction ID is also necessary when no tables are + * subscribed, as it prevents unnecessary retention of dead tuples. Although + * it seem feasible to skip all phases and directly assign candidate_xid to + * oldest_nonremovable_xid in the RDT_GET_CANDIDATE_XID phase when no tables + * are currently subscribed, this approach is unsafe. This is because new + * tables may be added to the subscription after the initial table check, + * requiring tuples deleted before candidate_xid for conflict detection in + * upcoming transactions. Therefore, it remains necessary to wait for all + * concurrent transactions to be fully applied. */ In the first para, the comments say that it is okay to add tables after this check and in the second para, it says that is not okay? 2. + * If the subscription has no tables, return the value determined by + * 'ready_if_no_tables'. + * + * Otherwise, return whether all the tables for the subscription are in the + * READY state. * * Note: This function is not suitable to be called from outside of apply or * tablesync workers because MySubscription needs to be already initialized. */ bool -AllTablesyncsReady(void) +AllTablesyncsReady(bool ready_if_no_tables) This change serves the purpose but I find it makes the API complex to understand because now it needs to make decisions based on different states depending on the boolean parameter passed. Can we introduce a new API for the empty subscription case? -- With Regards, Amit Kapila.
RE: Avoid retaining conflict-related data when no tables are subscribed
От
"Zhijie Hou (Fujitsu)"
Дата:
On Friday, August 29, 2025 12:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 28, 2025 at 7:54 AM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > Hi, > > > > My colleague Nisha reported an issue to me off-list: dead tuples can't > > be removed when retain_dead_tuples is enabled for a subscription with no > tables. > > > > This appears to stem from the inability to advance the non-removable > > transaction ID when AllTablesyncsReady() returns false. Since this > > function returns false when no tables are present, which leads to > > unnecessary data retention until a table is added to the subscription. > > > > Since dead tuples don't need to be retained when no tables are > > subscribed, here is a patch to fix it, modifying AllTablesyncsReady() > > to allows no tables to be treated as a ready state when explicitly requested. > > > > Few comments: > ============ > Aren't following two paragraphs in comments contradict each other: > > * It is safe to add new tables with initial states to the subscription > * after this check because any changes applied to these tables should > * have a WAL position greater than the rdt_data->remote_lsn. > + * > + * Advancing the transaction ID is also necessary when no tables are > + * subscribed, as it prevents unnecessary retention of dead tuples. > +Although > + * it seem feasible to skip all phases and directly assign > +candidate_xid to > + * oldest_nonremovable_xid in the RDT_GET_CANDIDATE_XID phase > when no > +tables > + * are currently subscribed, this approach is unsafe. This is because > +new > + * tables may be added to the subscription after the initial table > +check, > + * requiring tuples deleted before candidate_xid for conflict > +detection in > + * upcoming transactions. Therefore, it remains necessary to wait for > +all > + * concurrent transactions to be fully applied. > */ > > In the first para, the comments say that it is okay to add tables after this check > and in the second para, it says that is not okay? I have removed the 1st para because it's inaccurate. > > 2. > + * If the subscription has no tables, return the value determined by > + * 'ready_if_no_tables'. > + * > + * Otherwise, return whether all the tables for the subscription are in > + the > + * READY state. > * > * Note: This function is not suitable to be called from outside of apply or > * tablesync workers because MySubscription needs to be already initialized. > */ > bool > -AllTablesyncsReady(void) > +AllTablesyncsReady(bool ready_if_no_tables) > > This change serves the purpose but I find it makes the API complex to > understand because now it needs to make decisions based on different states > depending on the boolean parameter passed. Can we introduce a new API for > the empty subscription case? Added a new function HasSubscriptionRelationsCached() as suggested. Attached is the V2 patch which addresses the above comments and fixes a typo. Apart from the original reported issue, we noticed another point that can be improved during the implementation of update_deleted patches: When a disabled subscription is created with retain_dead_tuples set to true, the launcher is not woken up immediately, which may lead to delays in creating the conflict detection slot and cause user confusion. So, I prepared the 0002 patch to fix this issue. Best Regards, Hou zj
Вложения
On Tue, Sep 2, 2025 at 10:51 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Friday, August 29, 2025 12:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > bool > > -AllTablesyncsReady(void) > > +AllTablesyncsReady(bool ready_if_no_tables) > > > > This change serves the purpose but I find it makes the API complex to > > understand because now it needs to make decisions based on different states > > depending on the boolean parameter passed. Can we introduce a new API for > > the empty subscription case? > > Added a new function HasSubscriptionRelationsCached() as suggested. > Thanks, the new API looks better. I have a question related to one of the comments: * Although + * it seems feasible to skip all phases and directly assign candidate_xid to + * oldest_nonremovable_xid in the RDT_GET_CANDIDATE_XID phase when no tables + * are currently subscribed, this approach is unsafe. This is because new + * tables may be added to the subscription after the initial table check, + * requiring tuples deleted before candidate_xid for conflict detection in + * upcoming transactions. -- How is it possible that new tables added will have any data (deleted data in this case) from a transaction prior to the candidate_xid transaction? * Can we add a test to show that dead tuples are not retained even with retain_dead_tuples=true when subscription has no relations? -- With Regards, Amit Kapila.
RE: Avoid retaining conflict-related data when no tables are subscribed
От
"Zhijie Hou (Fujitsu)"
Дата:
On Thursday, September 4, 2025 7:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Sep 2, 2025 at 10:51 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > wrote: > > > > On Friday, August 29, 2025 12:05 PM Amit Kapila > <amit.kapila16@gmail.com> wrote: > > > > > > bool > > > -AllTablesyncsReady(void) > > > +AllTablesyncsReady(bool ready_if_no_tables) > > > > > > This change serves the purpose but I find it makes the API complex > > > to understand because now it needs to make decisions based on > > > different states depending on the boolean parameter passed. Can we > > > introduce a new API for the empty subscription case? > > > > Added a new function HasSubscriptionRelationsCached() as suggested. > > > > Thanks, the new API looks better. I have a question related to one of the > comments: Thanks for the comments. > > * > Although > + * it seems feasible to skip all phases and directly assign > +candidate_xid to > + * oldest_nonremovable_xid in the RDT_GET_CANDIDATE_XID phase > when no > +tables > + * are currently subscribed, this approach is unsafe. This is because > +new > + * tables may be added to the subscription after the initial table > +check, > + * requiring tuples deleted before candidate_xid for conflict > +detection in > + * upcoming transactions. > > -- > > How is it possible that new tables added will have any data (deleted data in > this case) from a transaction prior to the candidate_xid transaction? I initially considered the scenario where a table is created on the subscriber, and a user deletes a row before adding the table to the subscription. Upon reconsideration, it appears that retention is not guaranteed prior to table addition. However, after analyzing it further, there are some other reasons for checking the subscription table in the final phase. The comments have been updated accordingly to reflect the reasons. Please refer to the comments in the new patch version for details. > * Can we add a test to show that dead tuples are not retained even with > retain_dead_tuples=true when subscription has no relations? Yes, I have added a test for this. Here is the V3 patch. I merged two patches because they are both related to conflict data retention. Best Regards, Hou zj