Обсуждение: BUG #19037: Planner fails on estimating array length with "no relation entry" error
BUG #19037: Planner fails on estimating array length with "no relation entry" error
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 19037 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 18beta3 Operating system: Ubuntu 24.04 Description: The following script: create table t(ia int[]); select exists (select 1 from (select 1) where case when b then 1 else 0 end = 1) from (select 1 = any(ia) as b from t); triggers: ERROR: XX000: no relation entry for relid 2 LOCATION: find_base_rel, relnode.c:426 2025-08-30 11:28:26.036 UTC [2334676] LOG: statement: select exists (select 1 from (select 1) where case when b then 1 else 0 end = 1) from (select 1 = any(ia) as b from t); 2025-08-30 11:28:26.038 UTC [2334676] ERROR: no relation entry for relid 2 2025-08-30 11:28:26.038 UTC [2334676] BACKTRACE: find_base_rel at relnode.c:426:2 examine_variable at selfuncs.c:5319:16 estimate_array_length at selfuncs.c:2173:7 cost_qual_eval_walker at costsize.c:4874:25 expression_tree_walker_impl at nodeFuncs.c:2304:9 cost_qual_eval_walker at costsize.c:5043:1 expression_tree_walker_impl at nodeFuncs.c:2534:8 expression_tree_walker_impl at nodeFuncs.c:2215:8 cost_qual_eval_walker at costsize.c:5043:1 ... (Discovered with SQLsmith.) Reproduced starting from 9391f7152.
PG Bug reporting form <noreply@postgresql.org> writes: > The following script: > create table t(ia int[]); > select exists (select 1 from (select 1) where case when b then 1 else 0 end > = 1) > from (select 1 = any(ia) as b from t); > triggers: > ERROR: XX000: no relation entry for relid 2 > LOCATION: find_base_rel, relnode.c:426 Thanks for the report. After a bit of experimentation, I can shorten the reproducer to select exists (select 1 where (1 = any(ia))::int = 1) from t; ERROR: no relation entry for relid 1 but it doesn't happen any more if you simplify further to select exists (select 1 where 1 = any(ia)) from t; The reason for the difference seems to be that make_subplan checks to see if the EXISTS can be converted to a hashable ANY subplan (cf. convert_EXISTS_to_ANY), and the form where there's a top-level "=" operator in the sub-select's WHERE clause can be so converted. Then we hit the failure while trying to do cost_qual_eval on the converted ANY expression. So you also get this failure if you manually write out the form that convert_EXISTS_to_ANY is generating: select ((1 = any(ia))::int) = any (select 1) from t; ERROR: no relation entry for relid 1 Anyway that's sort of a sideshow. The real issue here is that we're applying cost_qual_eval before the planner has created any RelOptInfos, which means that examine_variable won't work. I find it surprising that this is the first report of such trouble, because it certainly isn't obvious that cost_qual_eval shouldn't be allowed to consult statistics. The most expedient solution is probably to hack examine_variable so that it doesn't fail if root->simple_rel_array isn't there yet. That seems mighty ugly though. Another low-risk response could be to revert 9391f7152. But I don't care for that because it's not really addressing the underlying problem. We might have the same issue elsewhere in cost estimation already, and even if we don't, it would be quite likely we'd introduce it again in future. In some sense the "right" fix would be to do SubPlan generation later, when we have statistics available for the Vars of the parent query. But that seems like a rather large task, and we'd surely not wish to back-patch the results. So I'm not really seeing another workable answer besides hacking examine_variable, more or less as attached. regards, tom lane diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 1c480cfaaf7..1a6db244c94 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -5309,6 +5309,20 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid, /* Save the exposed type of the expression */ vardata->vartype = exprType(node); + /* + * It's possible to get here during SubLink planning, before we have + * constructed any RelOptInfos for the parent query level. In that case + * we have to punt by pretending the node is variable-free, since we + * cannot satisfy the API expectation of passing back a RelOptInfo. + */ + if (root->simple_rel_array == NULL) + { + vardata->var = node; + vardata->atttype = vardata->vartype; + vardata->atttypmod = exprTypmod(node); + return; + } + /* Look inside any binary-compatible relabeling */ if (IsA(node, RelabelType))
On Sun, Aug 31, 2025 at 6:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The most expedient solution is probably to hack examine_variable > so that it doesn't fail if root->simple_rel_array isn't there yet. > That seems mighty ugly though. As an alternative, I wonder if we could pass root as NULL to cost_subplan() when it's called from build_subplan(), and possibly also from SS_process_ctes(). At those points, the root does not yet contain enough information to safely consult statistics. Meanwhile, cost_qual_eval() and the functions it calls are already well-equipped to handle a NULL root. - Richard
Richard Guo <guofenglinux@gmail.com> writes: > On Sun, Aug 31, 2025 at 6:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The most expedient solution is probably to hack examine_variable >> so that it doesn't fail if root->simple_rel_array isn't there yet. >> That seems mighty ugly though. > As an alternative, I wonder if we could pass root as NULL to > cost_subplan() when it's called from build_subplan(), and possibly > also from SS_process_ctes(). At those points, the root does not yet > contain enough information to safely consult statistics. Meanwhile, > cost_qual_eval() and the functions it calls are already well-equipped > to handle a NULL root. Good suggestion, that does seem more in keeping with established hacks^H^H^Hpractice. It'll need some commentary about why. I'm slightly tempted to do the dirty work in cost_subplan() itself, ie pass a NULL down to cost_qual_eval from there. Right now it doesn't matter, but maybe in future there would be some other safer use for the parent root in cost_subplan()? This approach would force the same behavior for the third caller of cost_subplan, SS_make_initplan_from_plan. But I don't think it matters there, since an initplan will by definition not be consulting anything from the parent plan. regards, tom lane
On Mon, Sep 1, 2025 at 11:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm slightly tempted to do the dirty work in cost_subplan() itself, > ie pass a NULL down to cost_qual_eval from there. Right now it > doesn't matter, but maybe in future there would be some other > safer use for the parent root in cost_subplan()? Fair point. > This approach > would force the same behavior for the third caller of cost_subplan, > SS_make_initplan_from_plan. But I don't think it matters there, > since an initplan will by definition not be consulting anything from > the parent plan. Yeah, it should be safe to use a NULL root for initplans. Besides, in the case of SS_make_initplan_from_plan, testexpr will be NULL, and cost_qual_eval() becomes a no-op. Here is the patch that passes root as NULL to cost_qual_eval() in cost_subplan(), along with some commentary about why. - Richard
Вложения
Richard Guo <guofenglinux@gmail.com> writes: > Here is the patch that passes root as NULL to cost_qual_eval() in > cost_subplan(), along with some commentary about why. v2 LGTM. I did think of a way to make the test case a little less busy-looking: select (1 = any(array_agg(f1))) = any (select false) from int4_tbl; but that's just cosmetic. regards, tom lane
On Wed, Sep 3, 2025 at 12:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > Here is the patch that passes root as NULL to cost_qual_eval() in > > cost_subplan(), along with some commentary about why. > v2 LGTM. I did think of a way to make the test case a little > less busy-looking: > > select (1 = any(array_agg(f1))) = any (select false) from int4_tbl; > > but that's just cosmetic. Thanks, I'll take this test case. One thing I'm not quite sure about is whether we should backpatch this fix to pre-v17 branches. Prior to v17, estimate_array_length() wasn't taught to use statistics, so this error isn't reproducible there. OTOH, passing a root without a valid simple_rel_array to cost_qual_eval() still seems potentially unsafe. What do you think? - Richard
Richard Guo <guofenglinux@gmail.com> writes: > One thing I'm not quite sure about is whether we should backpatch this > fix to pre-v17 branches. Prior to v17, estimate_array_length() wasn't > taught to use statistics, so this error isn't reproducible there. > OTOH, passing a root without a valid simple_rel_array to > cost_qual_eval() still seems potentially unsafe. What do you think? Yeah, "is there any other instance of this problem?" is the $64 question here. I was initially thinking v17 is sufficient, but the possibility that some extension might be vulnerable makes me lean to back-patching further. Your call ... regards, tom lane
On Wed, Sep 3, 2025 at 10:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > One thing I'm not quite sure about is whether we should backpatch this > > fix to pre-v17 branches. Prior to v17, estimate_array_length() wasn't > > taught to use statistics, so this error isn't reproducible there. > > OTOH, passing a root without a valid simple_rel_array to > > cost_qual_eval() still seems potentially unsafe. What do you think? > Yeah, "is there any other instance of this problem?" is the $64 > question here. I was initially thinking v17 is sufficient, but > the possibility that some extension might be vulnerable makes > me lean to back-patching further. Your call ... I've decided to backpatch this fix to pre-v17 branches: using a root that lacks a valid simple_rel_array in a cost estimation function seems like a pitfall waiting to happen. I've included an explanation of this backpatch decision in the commit message and have pushed the fix (the test case is not included in pre-v17 branches though). - Richard
Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error
От
Andrei Lepikhov
Дата:
On 30/8/2025 23:51, Tom Lane wrote: > PG Bug reporting form <noreply@postgresql.org> writes: >> The following script: > In some sense the "right" fix would be to do SubPlan generation > later, when we have statistics available for the Vars of the > parent query. But that seems like a rather large task, and > we'd surely not wish to back-patch the results. I'd vote to go this direction in the next version. Implementing memoisation of subplans to reduce the number of subplan evaluations, I encountered the same problem: the number of groups in the parameter set can't be estimated during subplan planning. I resolved this issue by employing upper_paths_hook to decide the addition of Memoise nodes into subplans at the end of upper query planning. But it is not an in-core solution and has a flaw with multi-level references. So, it would be better to prepare as much PlannerInfo's info, as possible (base rels, maybe query tree?) before filling RelOptInfos with potential paths. -- regards, Andrei Lepikhov
Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error
От
Alexander Lakhin
Дата:
Hello Tom and Richard,
03.09.2025 11:08, Richard Guo wrote:
03.09.2025 11:08, Richard Guo wrote:
Yeah, "is there any other instance of this problem?" is the $64 question here. I was initially thinking v17 is sufficient, but the possibility that some extension might be vulnerable makes me lean to back-patching further. Your call ...I've decided to backpatch this fix to pre-v17 branches: using a root that lacks a valid simple_rel_array in a cost estimation function seems like a pitfall waiting to happen. I've included an explanation of this backpatch decision in the commit message and have pushed the fix (the test case is not included in pre-v17 branches though).
Thank you for fixing that anomaly!
Best regards,
Alexander