Обсуждение: Correction of RowMark Removal During Sel-Join Elimination
Hi, I've noticed that the code for sel-join elimination incorrectly removes RowMarks. Currently, this doesn't lead to any issues because the optimiser retains pointers to the simple_rte_array. A RowMark refers to an RTE entry that is the same for both the relations being kept and those being removed. I believe it would be beneficial to address this issue now to prevent potential problems in the future. See the patch attached. -- regards, Andrei Lepikhov
Вложения
Basic concept looks good. However:
I'm not convinced this is an improvement from someone just coming in to this part of the code, especially given (for example) the comment right above it:
* Determine if the inner table can duplicate outer rows. We must
* bypass the unique rel cache here since we're possibly using a
and fixes the incorrect behaviour. Additionally, it renames variables to make
similar errors more apparent in the future.
- if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
- outer, JOIN_INNER, selfjoinquals,
+ if (!innerrel_is_unique_ext(root, joinrelids, rrel->relids,
+ krel, JOIN_INNER, selfjoinquals,
I'm not convinced this is an improvement from someone just coming in to this part of the code, especially given (for example) the comment right above it:
* Determine if the inner table can duplicate outer rows. We must
* bypass the unique rel cache here since we're possibly using a
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
On 11/8/2025 20:15, Greg Sabino Mullane wrote: > I'm not convinced this is an improvement from someone just coming in to > this part of the code, especially given (for example) the comment right > above it: > > * Determine if the inner table can duplicate outer rows. We must > * bypass the unique rel cache here since we're possibly using aThanks for your feedback. I made some minor adjustments to the comments to make the code more consistent. Sure, rrel and krel don't seem like the best solution - I guess natives could find less wordy synonyms than just dumb 'keeping_rel' and 'removing_rel'. But it is simpler to track which relation and RowMark should be removed. -- regards, Andrei Lepikhov
Вложения
Hi, Andrei! On Tue, Aug 12, 2025 at 12:40 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > On 11/8/2025 20:15, Greg Sabino Mullane wrote: > > I'm not convinced this is an improvement from someone just coming in to > > this part of the code, especially given (for example) the comment right > > above it: > > > > * Determine if the inner table can duplicate outer rows. We must > > * bypass the unique rel cache here since we're possibly using aThanks for your feedback. > I made some minor adjustments to the comments to make the code more > consistent. Sure, rrel and krel don't seem like the best solution - I > guess natives could find less wordy synonyms than just dumb > 'keeping_rel' and 'removing_rel'. But it is simpler to track which > relation and RowMark should be removed. Thank you for catching this. And thank you for the fix. I think it worth separating fix and refactoring. This helps to understand what exactly the fix is by looking at the patch. I also edited commit message. I'm going to push this if no objections. ------ Regards, Alexander Korotkov Supabase
Вложения
Hello Alexander,
24.08.2025 03:44, Alexander Korotkov wrote:
24.08.2025 03:44, Alexander Korotkov wrote:
Thank you for catching this. And thank you for the fix. I think it worth separating fix and refactoring. This helps to understand what exactly the fix is by looking at the patch. I also edited commit message. I'm going to push this if no objections.
Please try the following script:
create table t1(i1 int primary key);
create table t2 (i2 int, check (i2 is not null));
set constraint_exclusion = 'on';
select 1 from t1 right join t2 on i1 = i2 where (select true);
which triggers (starting from 5f6f951f8):
Core was generated by `postgres: law regression [local] SELECT '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000058dd6a7b6014 in var_is_nonnullable (root=0x58dd78d505f8, var=0x58dd78d75610, use_rel_info=false) at clauses.c:4242
4242 if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
(gdb) bt
#0 0x000058dd6a7b6014 in var_is_nonnullable (root=0x58dd78d505f8, var=0x58dd78d75610, use_rel_info=false) at clauses.c:4242
#1 0x000058dd6a7b4f74 in eval_const_expressions_mutator (node=0x58dd78d75590, context=0x7fff875a9f00) at clauses.c:3556
#2 0x000058dd6a7b2935 in eval_const_expressions (root=0x58dd78d505f8, node=0x58dd78d75590) at clauses.c:2272
#3 0x000058dd6a7c857d in get_relation_constraints (root=0x58dd78d505f8, relationObjectId=16385, rel=0x58dd78d738e0, include_noinherit=true, include_notnull=true, include_partition=true) at plancat.c:1419
#4 0x000058dd6a7c8fdb in relation_excluded_by_constraints (root=0x58dd78d505f8, rel=0x58dd78d738e0, rte=0x58dd78c6b968) at plancat.c:1808
#5 0x000058dd6a73675e in set_rel_size (root=0x58dd78d505f8, rel=0x58dd78d738e0, rti=2, rte=0x58dd78c6b968) at allpaths.c:364
#6 0x000058dd6a73664c in set_base_rel_sizes (root=0x58dd78d505f8) at allpaths.c:322
#7 0x000058dd6a73631e in make_one_rel (root=0x58dd78d505f8, joinlist=0x58dd78d74a90) at allpaths.c:183
...
Discovered with SQLsmith.
Best regards,
Alexander
On Sat, Aug 30, 2025 at 4:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > Hello Alexander, > > 24.08.2025 03:44, Alexander Korotkov wrote: > > Thank you for catching this. And thank you for the fix. I think it > worth separating fix and refactoring. This helps to understand what > exactly the fix is by looking at the patch. I also edited commit > message. I'm going to push this if no objections. > > > Please try the following script: > create table t1(i1 int primary key); > create table t2 (i2 int, check (i2 is not null)); > > set constraint_exclusion = 'on'; > > select 1 from t1 right join t2 on i1 = i2 where (select true); > > which triggers (starting from 5f6f951f8): > Core was generated by `postgres: law regression [local] SELECT '. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x000058dd6a7b6014 in var_is_nonnullable (root=0x58dd78d505f8, var=0x58dd78d75610, use_rel_info=false) at clauses.c:4242 > 4242 if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) > (gdb) bt > #0 0x000058dd6a7b6014 in var_is_nonnullable (root=0x58dd78d505f8, var=0x58dd78d75610, use_rel_info=false) at clauses.c:4242 > #1 0x000058dd6a7b4f74 in eval_const_expressions_mutator (node=0x58dd78d75590, context=0x7fff875a9f00) at clauses.c:3556 > #2 0x000058dd6a7b2935 in eval_const_expressions (root=0x58dd78d505f8, node=0x58dd78d75590) at clauses.c:2272 > #3 0x000058dd6a7c857d in get_relation_constraints (root=0x58dd78d505f8, relationObjectId=16385, rel=0x58dd78d738e0, include_noinherit=true,include_notnull=true, include_partition=true) at plancat.c:1419 > #4 0x000058dd6a7c8fdb in relation_excluded_by_constraints (root=0x58dd78d505f8, rel=0x58dd78d738e0, rte=0x58dd78c6b968)at plancat.c:1808 > #5 0x000058dd6a73675e in set_rel_size (root=0x58dd78d505f8, rel=0x58dd78d738e0, rti=2, rte=0x58dd78c6b968) at allpaths.c:364 > #6 0x000058dd6a73664c in set_base_rel_sizes (root=0x58dd78d505f8) at allpaths.c:322 > #7 0x000058dd6a73631e in make_one_rel (root=0x58dd78d505f8, joinlist=0x58dd78d74a90) at allpaths.c:183 Thank you for pointing. I'm investigating this now. ------ Regards, Alexander Korotkov Supabase
On Sat, Aug 30, 2025 at 9:54 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Sat, Aug 30, 2025 at 4:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > > > Hello Alexander, > > > > 24.08.2025 03:44, Alexander Korotkov wrote: > > > > Thank you for catching this. And thank you for the fix. I think it > > worth separating fix and refactoring. This helps to understand what > > exactly the fix is by looking at the patch. I also edited commit > > message. I'm going to push this if no objections. > > > > > > Please try the following script: > > create table t1(i1 int primary key); > > create table t2 (i2 int, check (i2 is not null)); > > > > set constraint_exclusion = 'on'; > > > > select 1 from t1 right join t2 on i1 = i2 where (select true); > > > > which triggers (starting from 5f6f951f8): > > Core was generated by `postgres: law regression [local] SELECT '. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 0x000058dd6a7b6014 in var_is_nonnullable (root=0x58dd78d505f8, var=0x58dd78d75610, use_rel_info=false) at clauses.c:4242 > > 4242 if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) > > (gdb) bt > > #0 0x000058dd6a7b6014 in var_is_nonnullable (root=0x58dd78d505f8, var=0x58dd78d75610, use_rel_info=false) at clauses.c:4242 > > #1 0x000058dd6a7b4f74 in eval_const_expressions_mutator (node=0x58dd78d75590, context=0x7fff875a9f00) at clauses.c:3556 > > #2 0x000058dd6a7b2935 in eval_const_expressions (root=0x58dd78d505f8, node=0x58dd78d75590) at clauses.c:2272 > > #3 0x000058dd6a7c857d in get_relation_constraints (root=0x58dd78d505f8, relationObjectId=16385, rel=0x58dd78d738e0,include_noinherit=true, include_notnull=true, include_partition=true) at plancat.c:1419 > > #4 0x000058dd6a7c8fdb in relation_excluded_by_constraints (root=0x58dd78d505f8, rel=0x58dd78d738e0, rte=0x58dd78c6b968)at plancat.c:1808 > > #5 0x000058dd6a73675e in set_rel_size (root=0x58dd78d505f8, rel=0x58dd78d738e0, rti=2, rte=0x58dd78c6b968) at allpaths.c:364 > > #6 0x000058dd6a73664c in set_base_rel_sizes (root=0x58dd78d505f8) at allpaths.c:322 > > #7 0x000058dd6a73631e in make_one_rel (root=0x58dd78d505f8, joinlist=0x58dd78d74a90) at allpaths.c:183 > > Thank you for pointing. I'm investigating this now. Hmm... It seems that commit 5f6f951f88 spotted some existing bug. get_relation_constraints() deserializes the constraint node, but it initially refers varno == 1. get_relation_constraints() calls ChangeVarNodes() to change varno from 1 to 2, but only after calling eval_const_expressions() which access root->simple_rte_array[] with wrong varno... The draft patch fixing this is attached. I will continue the investigation. ------ Regards, Alexander Korotkov Supabase
Вложения
On Sat, Aug 30, 2025 at 10:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Sat, Aug 30, 2025 at 9:54 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Sat, Aug 30, 2025 at 4:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > > > > > Hello Alexander, > > > > > > 24.08.2025 03:44, Alexander Korotkov wrote: > > > > > > Thank you for catching this. And thank you for the fix. I think it > > > worth separating fix and refactoring. This helps to understand what > > > exactly the fix is by looking at the patch. I also edited commit > > > message. I'm going to push this if no objections. > > > > > > > > > Please try the following script: > > > create table t1(i1 int primary key); > > > create table t2 (i2 int, check (i2 is not null)); > > > > > > set constraint_exclusion = 'on'; > > > > > > select 1 from t1 right join t2 on i1 = i2 where (select true); > > > > > > which triggers (starting from 5f6f951f8): > > > Core was generated by `postgres: law regression [local] SELECT '. > > > Program terminated with signal SIGSEGV, Segmentation fault. > > > #0 0x000058dd6a7b6014 in var_is_nonnullable (root=0x58dd78d505f8, var=0x58dd78d75610, use_rel_info=false) at clauses.c:4242 > > > 4242 if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) > > > (gdb) bt > > > #0 0x000058dd6a7b6014 in var_is_nonnullable (root=0x58dd78d505f8, var=0x58dd78d75610, use_rel_info=false) at clauses.c:4242 > > > #1 0x000058dd6a7b4f74 in eval_const_expressions_mutator (node=0x58dd78d75590, context=0x7fff875a9f00) at clauses.c:3556 > > > #2 0x000058dd6a7b2935 in eval_const_expressions (root=0x58dd78d505f8, node=0x58dd78d75590) at clauses.c:2272 > > > #3 0x000058dd6a7c857d in get_relation_constraints (root=0x58dd78d505f8, relationObjectId=16385, rel=0x58dd78d738e0,include_noinherit=true, include_notnull=true, include_partition=true) at plancat.c:1419 > > > #4 0x000058dd6a7c8fdb in relation_excluded_by_constraints (root=0x58dd78d505f8, rel=0x58dd78d738e0, rte=0x58dd78c6b968)at plancat.c:1808 > > > #5 0x000058dd6a73675e in set_rel_size (root=0x58dd78d505f8, rel=0x58dd78d738e0, rti=2, rte=0x58dd78c6b968) at allpaths.c:364 > > > #6 0x000058dd6a73664c in set_base_rel_sizes (root=0x58dd78d505f8) at allpaths.c:322 > > > #7 0x000058dd6a73631e in make_one_rel (root=0x58dd78d505f8, joinlist=0x58dd78d74a90) at allpaths.c:183 > > > > Thank you for pointing. I'm investigating this now. > > Hmm... It seems that commit 5f6f951f88 spotted some existing bug. > get_relation_constraints() deserializes the constraint node, but it > initially refers varno == 1. get_relation_constraints() calls > ChangeVarNodes() to change varno from 1 to 2, but only after calling > eval_const_expressions() which access root->simple_rte_array[] with > wrong varno... > > The draft patch fixing this is attached. I will continue the investigation. The same patch with a bit revised comment and commit message. ------ Regards, Alexander Korotkov Supabase
Вложения
On Sun, Aug 31, 2025 at 8:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > The draft patch fixing this is attached. I will continue the investigation. > > The same patch with a bit revised comment and commit message. FWIW, I reported this same issue and proposed the patch last week in Discussion: https://postgr.es/m/CAMbWs48x=GsGan_bvQ+j0rLqYFNi9W725EmJpWUZv1RrmaicZQ@mail.gmail.com Thanks Richard
On Sun, Aug 31, 2025 at 8:41 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Sun, Aug 31, 2025 at 8:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > The same patch with a bit revised comment and commit message. > FWIW, I reported this same issue and proposed the patch last week in > Discussion: https://postgr.es/m/CAMbWs48x=GsGan_bvQ+j0rLqYFNi9W725EmJpWUZv1RrmaicZQ@mail.gmail.com I think it's better to push this patch sooner rather than later, as multiple people have encountered the issue in different ways. I'll go ahead and push 0001 from my patch set shortly. I don't think we need to backpatch it, since the issue only occurs after commit e2debb643. Thanks Richard
On Sun, Aug 31, 2025 at 2:51 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Sun, Aug 31, 2025 at 8:41 AM Richard Guo <guofenglinux@gmail.com> wrote: > > On Sun, Aug 31, 2025 at 8:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > The same patch with a bit revised comment and commit message. > > > FWIW, I reported this same issue and proposed the patch last week in > > Discussion: https://postgr.es/m/CAMbWs48x=GsGan_bvQ+j0rLqYFNi9W725EmJpWUZv1RrmaicZQ@mail.gmail.com > > I think it's better to push this patch sooner rather than later, as > multiple people have encountered the issue in different ways. I'll go > ahead and push 0001 from my patch set shortly. I see. You're much farther than I'm. Please, go ahead pushing this. > I don't think we need > to backpatch it, since the issue only occurs after commit e2debb643. I think you're right. ------ Regards, Alexander Korotkov Supabase
On Sun, Aug 31, 2025 at 9:50 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Sun, Aug 31, 2025 at 2:51 AM Richard Guo <guofenglinux@gmail.com> wrote: > > I think it's better to push this patch sooner rather than later, as > > multiple people have encountered the issue in different ways. I'll go > > ahead and push 0001 from my patch set shortly. > I see. You're much farther than I'm. Please, go ahead pushing this. Done. Thanks Richard
On 31/8/2025 03:03, Richard Guo wrote: > On Sun, Aug 31, 2025 at 9:50 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> On Sun, Aug 31, 2025 at 2:51 AM Richard Guo <guofenglinux@gmail.com> wrote: >>> I think it's better to push this patch sooner rather than later, as >>> multiple people have encountered the issue in different ways. I'll go >>> ahead and push 0001 from my patch set shortly. > >> I see. You're much farther than I'm. Please, go ahead pushing this. > > Done. Hmm, is this enough? As I see, it is a typical pattern: after deserialisation by the stringToNode function, Postgres attempts to call eval_const_expressions. See, for example, RelationGetIndexExpressions. That seems strange, because who knows what the entry No.1 is and how constant evaluation over this relation influences the expression tree. Perhaps it would be beneficial to refactor all this logic? Maybe go further and, before storing any relation constraint to the system catalogue, replace varno with something like 'UNDEF_VAR (-5)'? I thought about why it was designed that way. Commit eabc714 shows that at that time, the eval_const_expressions didn't touch PlannerInfo data at all. Currently, it requires query structures in some cases, particularly in prosupport functions, which necessitates having correct relation references. -- regards, Andrei Lepikhov