Обсуждение: BUG #19037: Planner fails on estimating array length with "no relation entry" error

Поиск
Список
Период
Сортировка

BUG #19037: Planner fails on estimating array length with "no relation entry" error

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19037
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 18beta3
Operating system:   Ubuntu 24.04
Description:

The following script:
create table t(ia int[]);
select exists (select 1 from (select 1) where case when b then 1 else 0 end
= 1)
  from (select 1 = any(ia) as b from t);

triggers:
ERROR:  XX000: no relation entry for relid 2
LOCATION:  find_base_rel, relnode.c:426

2025-08-30 11:28:26.036 UTC [2334676] LOG:  statement: select exists (select
1 from (select 1) where case when b then 1 else 0 end = 1)
          from (select 1 = any(ia) as b from t);
2025-08-30 11:28:26.038 UTC [2334676] ERROR:  no relation entry for relid 2
2025-08-30 11:28:26.038 UTC [2334676] BACKTRACE:
find_base_rel at relnode.c:426:2
examine_variable at selfuncs.c:5319:16
estimate_array_length at selfuncs.c:2173:7
cost_qual_eval_walker at costsize.c:4874:25
expression_tree_walker_impl at nodeFuncs.c:2304:9
cost_qual_eval_walker at costsize.c:5043:1
expression_tree_walker_impl at nodeFuncs.c:2534:8
expression_tree_walker_impl at nodeFuncs.c:2215:8
cost_qual_eval_walker at costsize.c:5043:1
...

(Discovered with SQLsmith.)

Reproduced starting from 9391f7152.


PG Bug reporting form <noreply@postgresql.org> writes:
> The following script:
> create table t(ia int[]);
> select exists (select 1 from (select 1) where case when b then 1 else 0 end
> = 1)
>   from (select 1 = any(ia) as b from t);
> triggers:
> ERROR:  XX000: no relation entry for relid 2
> LOCATION:  find_base_rel, relnode.c:426

Thanks for the report.  After a bit of experimentation, I can shorten
the reproducer to

select exists (select 1 where (1 = any(ia))::int = 1) from t;
ERROR:  no relation entry for relid 1

but it doesn't happen any more if you simplify further to

select exists (select 1 where 1 = any(ia)) from t;

The reason for the difference seems to be that make_subplan
checks to see if the EXISTS can be converted to a hashable ANY
subplan (cf. convert_EXISTS_to_ANY), and the form where there's
a top-level "=" operator in the sub-select's WHERE clause can
be so converted.  Then we hit the failure while trying to do
cost_qual_eval on the converted ANY expression.  So you also
get this failure if you manually write out the form that
convert_EXISTS_to_ANY is generating:

select ((1 = any(ia))::int) = any (select 1) from t;
ERROR:  no relation entry for relid 1

Anyway that's sort of a sideshow.  The real issue here is that
we're applying cost_qual_eval before the planner has created
any RelOptInfos, which means that examine_variable won't work.
I find it surprising that this is the first report of such trouble,
because it certainly isn't obvious that cost_qual_eval shouldn't
be allowed to consult statistics.

The most expedient solution is probably to hack examine_variable
so that it doesn't fail if root->simple_rel_array isn't there yet.
That seems mighty ugly though.

Another low-risk response could be to revert 9391f7152.  But I don't
care for that because it's not really addressing the underlying
problem.  We might have the same issue elsewhere in cost estimation
already, and even if we don't, it would be quite likely we'd introduce
it again in future.

In some sense the "right" fix would be to do SubPlan generation
later, when we have statistics available for the Vars of the
parent query.  But that seems like a rather large task, and
we'd surely not wish to back-patch the results.

So I'm not really seeing another workable answer besides hacking
examine_variable, more or less as attached.

            regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 1c480cfaaf7..1a6db244c94 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5309,6 +5309,20 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
     /* Save the exposed type of the expression */
     vardata->vartype = exprType(node);

+    /*
+     * It's possible to get here during SubLink planning, before we have
+     * constructed any RelOptInfos for the parent query level.  In that case
+     * we have to punt by pretending the node is variable-free, since we
+     * cannot satisfy the API expectation of passing back a RelOptInfo.
+     */
+    if (root->simple_rel_array == NULL)
+    {
+        vardata->var = node;
+        vardata->atttype = vardata->vartype;
+        vardata->atttypmod = exprTypmod(node);
+        return;
+    }
+
     /* Look inside any binary-compatible relabeling */

     if (IsA(node, RelabelType))

Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error

От
Richard Guo
Дата:
On Sun, Aug 31, 2025 at 6:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The most expedient solution is probably to hack examine_variable
> so that it doesn't fail if root->simple_rel_array isn't there yet.
> That seems mighty ugly though.

As an alternative, I wonder if we could pass root as NULL to
cost_subplan() when it's called from build_subplan(), and possibly
also from SS_process_ctes().  At those points, the root does not yet
contain enough information to safely consult statistics.  Meanwhile,
cost_qual_eval() and the functions it calls are already well-equipped
to handle a NULL root.

- Richard



Richard Guo <guofenglinux@gmail.com> writes:
> On Sun, Aug 31, 2025 at 6:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The most expedient solution is probably to hack examine_variable
>> so that it doesn't fail if root->simple_rel_array isn't there yet.
>> That seems mighty ugly though.

> As an alternative, I wonder if we could pass root as NULL to
> cost_subplan() when it's called from build_subplan(), and possibly
> also from SS_process_ctes().  At those points, the root does not yet
> contain enough information to safely consult statistics.  Meanwhile,
> cost_qual_eval() and the functions it calls are already well-equipped
> to handle a NULL root.

Good suggestion, that does seem more in keeping with established
hacks^H^H^Hpractice.  It'll need some commentary about why.

I'm slightly tempted to do the dirty work in cost_subplan() itself,
ie pass a NULL down to cost_qual_eval from there.  Right now it
doesn't matter, but maybe in future there would be some other
safer use for the parent root in cost_subplan()?  This approach
would force the same behavior for the third caller of cost_subplan,
SS_make_initplan_from_plan.  But I don't think it matters there,
since an initplan will by definition not be consulting anything from
the parent plan.

            regards, tom lane



Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error

От
Richard Guo
Дата:
On Mon, Sep 1, 2025 at 11:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm slightly tempted to do the dirty work in cost_subplan() itself,
> ie pass a NULL down to cost_qual_eval from there.  Right now it
> doesn't matter, but maybe in future there would be some other
> safer use for the parent root in cost_subplan()?

Fair point.

> This approach
> would force the same behavior for the third caller of cost_subplan,
> SS_make_initplan_from_plan.  But I don't think it matters there,
> since an initplan will by definition not be consulting anything from
> the parent plan.

Yeah, it should be safe to use a NULL root for initplans.  Besides, in
the case of SS_make_initplan_from_plan, testexpr will be NULL, and
cost_qual_eval() becomes a no-op.

Here is the patch that passes root as NULL to cost_qual_eval() in
cost_subplan(), along with some commentary about why.

- Richard

Вложения
Richard Guo <guofenglinux@gmail.com> writes:
> Here is the patch that passes root as NULL to cost_qual_eval() in
> cost_subplan(), along with some commentary about why.

v2 LGTM.  I did think of a way to make the test case a little
less busy-looking:

select (1 = any(array_agg(f1))) = any (select false) from int4_tbl;

but that's just cosmetic.

            regards, tom lane



Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error

От
Richard Guo
Дата:
On Wed, Sep 3, 2025 at 12:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
> > Here is the patch that passes root as NULL to cost_qual_eval() in
> > cost_subplan(), along with some commentary about why.

> v2 LGTM.  I did think of a way to make the test case a little
> less busy-looking:
>
> select (1 = any(array_agg(f1))) = any (select false) from int4_tbl;
>
> but that's just cosmetic.

Thanks, I'll take this test case.

One thing I'm not quite sure about is whether we should backpatch this
fix to pre-v17 branches.  Prior to v17, estimate_array_length() wasn't
taught to use statistics, so this error isn't reproducible there.
OTOH, passing a root without a valid simple_rel_array to
cost_qual_eval() still seems potentially unsafe.  What do you think?

- Richard



Richard Guo <guofenglinux@gmail.com> writes:
> One thing I'm not quite sure about is whether we should backpatch this
> fix to pre-v17 branches.  Prior to v17, estimate_array_length() wasn't
> taught to use statistics, so this error isn't reproducible there.
> OTOH, passing a root without a valid simple_rel_array to
> cost_qual_eval() still seems potentially unsafe.  What do you think?

Yeah, "is there any other instance of this problem?" is the $64
question here.  I was initially thinking v17 is sufficient, but
the possibility that some extension might be vulnerable makes
me lean to back-patching further.  Your call ...

            regards, tom lane



Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error

От
Richard Guo
Дата:
On Wed, Sep 3, 2025 at 10:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
> > One thing I'm not quite sure about is whether we should backpatch this
> > fix to pre-v17 branches.  Prior to v17, estimate_array_length() wasn't
> > taught to use statistics, so this error isn't reproducible there.
> > OTOH, passing a root without a valid simple_rel_array to
> > cost_qual_eval() still seems potentially unsafe.  What do you think?

> Yeah, "is there any other instance of this problem?" is the $64
> question here.  I was initially thinking v17 is sufficient, but
> the possibility that some extension might be vulnerable makes
> me lean to back-patching further.  Your call ...

I've decided to backpatch this fix to pre-v17 branches: using a root
that lacks a valid simple_rel_array in a cost estimation function
seems like a pitfall waiting to happen.  I've included an explanation
of this backpatch decision in the commit message and have pushed the
fix (the test case is not included in pre-v17 branches though).

- Richard



Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error

От
Andrei Lepikhov
Дата:
On 30/8/2025 23:51, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> The following script:
> In some sense the "right" fix would be to do SubPlan generation
> later, when we have statistics available for the Vars of the
> parent query.  But that seems like a rather large task, and
> we'd surely not wish to back-patch the results.
I'd vote to go this direction in the next version.
Implementing memoisation of subplans to reduce the number of subplan 
evaluations, I encountered the same problem: the number of groups in the 
parameter set can't be estimated during subplan planning.
I resolved this issue by employing upper_paths_hook to decide the 
addition of Memoise nodes into subplans at the end of upper query planning.
But it is not an in-core solution and has a flaw with multi-level 
references.
So, it would be better to prepare as much PlannerInfo's info, as 
possible (base rels, maybe query tree?) before filling RelOptInfos with 
potential paths.

-- 
regards, Andrei Lepikhov



Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error

От
Alexander Lakhin
Дата:
Hello Tom and Richard,

03.09.2025 11:08, Richard Guo wrote:
Yeah, "is there any other instance of this problem?" is the $64
question here.  I was initially thinking v17 is sufficient, but
the possibility that some extension might be vulnerable makes
me lean to back-patching further.  Your call ...
I've decided to backpatch this fix to pre-v17 branches: using a root
that lacks a valid simple_rel_array in a cost estimation function
seems like a pitfall waiting to happen.  I've included an explanation
of this backpatch decision in the commit message and have pushed the
fix (the test case is not included in pre-v17 branches though).

Thank you for fixing that anomaly!

Best regards,
Alexander