Обсуждение: BUG #19353: Error XX000 if referencing expanded array in grouping set: variable not found in subplan target list

Поиск
Список
Период
Сортировка
The following bug has been logged on the website:

Bug reference:      19353
Logged by:          Marian MULLER REBEYROL
Email address:      marian.muller@serli.com
PostgreSQL version: 18.1
Operating system:   Linux
Description:

Hi,

Let me start by thanking you for providing such a great tool!

After upgrading to Postgres 18 I've come across an error I wasn't getting
beforehand. Here's a minimal way to reproduce the issue, that used to work
well in Postgres 12 and 17 at least.

Table and data:
```
create table items (
  id      varchar(255) primary key,
  brands  varchar array,
  markets varchar array
);
insert into items values ('Item1', '{Brand A,Brand B}', '{Market A}');
insert into items values ('Item2', '{Brand B}', '{Market B,Market A}');
insert into items values ('Item3', '{Brand A,Brand C}', '{Market B}');
```

When querying this table using grouping sets, I get an internal error as
soon as the grouping sets reference an expanded array, for instance this
query works:
```
SELECT brands, markets, count(distinct id) count
FROM items
GROUP BY GROUPING SETS (brands, markets, ())
ORDER BY brands asc, markets asc;
```

But this one triggers an error:
```
SELECT brands, unnest(markets) as market, count(distinct id) count
FROM items
GROUP BY GROUPING SETS (brands, market, ())
ORDER BY brands asc, market asc;
```

This gives me the following error on the latest release (PostgreSQL 18.1 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 15.2.1 20251112, 64-bit):
```
ERROR:  XX000: variable not found in subplan target list
LOCATION:  fix_upper_expr_mutator, setrefs.c:3314
```

I've confirmed the error still exists with a vanilla freshly-compiled
Postgres (PostgreSQL 19devel on x86_64-pc-linux-gnu, compiled by gcc (GCC)
15.2.1 20251112, 64-bit ; latest commit at the time: b65f1ad):
```
2025-12-12 10:39:51.151 CET [347004] ERROR:  variable not found in subplan
target list
2025-12-12 10:39:51.151 CET [347004] STATEMENT:  SELECT brands,
unnest(markets) as market, count(distinct id) count
        FROM items
        GROUP BY GROUPING SETS (brands, market, ())
        ORDER BY brands asc, market asc;
ERROR:  XX000: variable not found in subplan target list
LOCATION:  fix_upper_expr_mutator, setrefs.c:3335
```

While this query used to work in previous versions, I'm unsure from the
documentation whether it is expected to work or not.

My platform is GNU/Linux (up-to-date Manjaro), kernel 6.16.8, on x86_64.
I've also experienced the error on a cloud-hosted Postgres instance.

I've tried to be exhaustive but will gladly provide more information if
necessary.

Thanks.


PG Bug reporting form <noreply@postgresql.org> writes:
> After upgrading to Postgres 18 I've come across an error I wasn't getting
> beforehand. Here's a minimal way to reproduce the issue, that used to work
> well in Postgres 12 and 17 at least.

Thank you for this well-crafted bug report!  Bisecting shows that
it broke at

f5050f795aea67dfc40bbc429c8934e9439e22e7 is the first bad commit
commit f5050f795aea67dfc40bbc429c8934e9439e22e7 (HEAD)
Author: Richard Guo <rguo@postgresql.org>
Date:   Tue Sep 10 12:36:48 2024 +0900

    Mark expressions nullable by grouping sets

I have not dug any deeper than that.

            regards, tom lane





Tom Lane <tgl@sss.pgh.pa.us> 于2025年12月13日周六 00:54写道:
PG Bug reporting form <noreply@postgresql.org> writes:
> After upgrading to Postgres 18 I've come across an error I wasn't getting
> beforehand. Here's a minimal way to reproduce the issue, that used to work
> well in Postgres 12 and 17 at least.

Thank you for this well-crafted bug report!  Bisecting shows that
it broke at

f5050f795aea67dfc40bbc429c8934e9439e22e7 is the first bad commit
commit f5050f795aea67dfc40bbc429c8934e9439e22e7 (HEAD)
Author: Richard Guo <rguo@postgresql.org>
Date:   Tue Sep 10 12:36:48 2024 +0900

    Mark expressions nullable by grouping sets

I have not dug any deeper than that.

                        regards, tom lane



When there is SRF in the query, the grouping_target changes  in grouping_planner when entering the following function:
split_pathtarget_at_srfs(root, grouping_target, scanjoin_target,
&grouping_targets,
&grouping_targets_contain_srfs);

(gdb) pgprint grouping_target
PathTarget [sortgrouprefs=0x556daae88358 cost={startup = 0, per_tuple = 0} width=72 has_volatile_expr=VOLATILITY_UNKNOWN]
[exprs]
Var [varno=1 varattno=3 vartype=1015 varcollid=100 varreturningtype=VAR_RETURNING_DEFAULT varnosyn=1 varattnosyn=3][varnullingrels] Bitmapset [2]
Var [varno=1 varattno=2 vartype=1015 varcollid=100 varreturningtype=VAR_RETURNING_DEFAULT varnosyn=1 varattnosyn=2][varnullingrels] Bitmapset [2]
Aggref [aggfnoid=2147 aggtype=20 inputcollid=100 aggtranstype=20 aggstar=false aggvariadic=false aggkind=110 'n' aggpresorted=false agglevelsup=0 aggsplit=AGGSPLIT_SIMPLE
       aggno=0 aggtransno=0][aggargtypes] OidList: [1043]
[args]
TargetEntry [resno=1 ressortgroupref=1]
Var [varno=1 varattno=1 vartype=1043 vartypmod=259 varcollid=100 varreturningtype=VAR_RETURNING_DEFAULT varnosyn=1 varattnosyn=1]
[aggdistinct]
SortGroupClause [tleSortGroupRef=1 eqop=98 sortop=664 reverse_sort=false nulls_first=false hashable=true]

So the targetlist of Agg is :
gdb) pgprint plan->targetlist
TargetEntry [resno=1]
Var [varno=1 varattno=3 vartype=1015 varcollid=100 varreturningtype=VAR_RETURNING_DEFAULT varnosyn=1 varattnosyn=3][varnullingrels] Bitmapset [2]
TargetEntry [resno=2 ressortgroupref=1]
Var [varno=1 varattno=2 vartype=1015 varcollid=100 varreturningtype=VAR_RETURNING_DEFAULT varnosyn=1 varattnosyn=2][varnullingrels] Bitmapset [2]
TargetEntry [resno=3]
Aggref [aggfnoid=2147 aggtype=20 inputcollid=100 aggtranstype=20 aggstar=false aggvariadic=false aggkind=110 'n' aggpresorted=false agglevelsup=0 aggsplit=AGGSPLIT_SIMPLE
       aggno=0 aggtransno=0]
[aggargtypes] OidList: [1043]
[args]
TargetEntry [resno=1 ressortgroupref=1]
Var [varno=1 varattno=1 vartype=1043 vartypmod=259 varcollid=100 varreturningtype=VAR_RETURNING_DEFAULT varnosyn=1 varattnosyn=1]
[aggdistinct]
SortGroupClause [tleSortGroupRef=1 eqop=98 sortop=664 reverse_sort=false nulls_first=false hashable=true]

But the subplan->targetlist is :
Var [varno=1 varattno=2 vartype=1015 varcollid=100 varreturningtype=VAR_RETURNING_DEFAULT varnosyn=1 varattnosyn=2]
FuncExpr [funcid=2331 funcresulttype=1043 funcretset=true funcvariadic=false funcformat=COERCE_EXPLICIT_CALL funccollid=100 inputcollid=100]
         Var [varno=1 varattno=3 vartype=1015 varcollid=100 varreturningtype=VAR_RETURNING_DEFAULT varnosyn=1 varattnosyn=3]
Var [varno=1 varattno=1 vartype=1043 vartypmod=259 varcollid=100 varreturningtype=VAR_RETURNING_DEFAULT varnosyn=1 varattnosyn=1]

in build_tlist_index(subplan->targetlist),  varno = 1 varattno =3 will not be added into tlist_vinfo array, so "ERROR:  variable not found in subplan target list"
will report because search_indexed_tlist_for_var() returning null in fix_upper_expr_mutator().

--
Thanks,
Tender Wang


Richard Guo <guofenglinux@gmail.com> 于2025年12月15日周一 22:29写道:
On Sat, Dec 13, 2025 at 7:02 PM Tender Wang <tndrwang@gmail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> 于2025年12月13日周六 00:54写道:
>> PG Bug reporting form <noreply@postgresql.org> writes:
>> > After upgrading to Postgres 18 I've come across an error I wasn't getting
>> > beforehand. Here's a minimal way to reproduce the issue, that used to work
>> > well in Postgres 12 and 17 at least.

>> Thank you for this well-crafted bug report!  Bisecting shows that
>> it broke at
>>
>> f5050f795aea67dfc40bbc429c8934e9439e22e7 is the first bad commit
>> commit f5050f795aea67dfc40bbc429c8934e9439e22e7 (HEAD)
>> Author: Richard Guo <rguo@postgresql.org>
>> Date:   Tue Sep 10 12:36:48 2024 +0900
>>
>>     Mark expressions nullable by grouping sets

> When there is SRF in the query, the grouping_target changes  in grouping_planner when entering the following function:
> split_pathtarget_at_srfs(root, grouping_target, scanjoin_target,
> &grouping_targets,
> &grouping_targets_contain_srfs);

Yeah, the issue happens here.  In split_pathtarget_at_srfs(), if we
find a subexpression that matches an expression already computed in
the previous plan level, we should treat it as a Var and should not
split it further.  setrefs.c will later replace the expression with a
Var referencing the subplan output.

However, when processing the grouping target for grouping sets, this
function can fail to recognize that an expression is already computed
in the scan/join phase.  This happens because the comparison crosses
the grouping boundary: expressions in the grouping target may carry
the grouping nulling bit, while the corresponding expressions in the
scan/join target do not.

Yes, grouping target containing nulling bits makes equal(var_grouping_target, var_in_scan_join_target) return false.
The nulling bits in the grouping target should be ignored when processing across the grouping boundary. 


This mismatch leads this function to incorrectly assume that the
expression (e.g., unnest(markets)) needs to be re-evaluated from its
arguments, which are often unavailable in the subplan.

To fix, I think we should ignore the grouping nulling bit when
checking if an expression from the grouping target is available in the
pre-grouping input target.  This is actually what we do in setrefs.c.

Hence, attached patch.

The patch works for me. All regression tests pass with this patch. 

(I'm currently on vacation and will take a closer look when I return.
Anyone who wants to move this forward before then is welcome.)

Enjoy your vacation!


--
Thanks,
Tender Wang