Обсуждение: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

Поиск
Список
Период
Сортировка

Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
邱宇航
Дата:
I've noticed that two GROUP BY ROLLUP queries behave differently in v17
compared to master and REL_18_STABLE. The issue can be reproduced by
following SQL:

``` SQL
CREATE TABLE t(id int);

INSERT INTO t SELECT generate_series(1, 3);

-- Query 1
SELECT DISTINCT 'XXX'
FROM t
GROUP BY ROLLUP (id, 1);

-- Query 2
SELECT 'XXX'
FROM t
GROUP BY ROLLUP(id)
HAVING NOT (NULL IS NULL);
```

After some git bisect work, I traced the root cause:
- The first issue was introduced by commit f5050f79 (Mark expressions
nullable by grouping sets).
- The second issue stems from commit 67a54b9e (Allow pushdown of HAVING
clauses with grouping sets).

Best regards,
Yuhang Qiu




Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
David Rowley
Дата:
On Tue, 23 Sept 2025 at 15:49, 邱宇航 <iamqyh@gmail.com> wrote:
> I've noticed that two GROUP BY ROLLUP queries behave differently in v17
> compared to master and REL_18_STABLE. The issue can be reproduced by
> following SQL:
>
> After some git bisect work, I traced the root cause:
> - The first issue was introduced by commit f5050f79 (Mark expressions
> nullable by grouping sets).
> - The second issue stems from commit 67a54b9e (Allow pushdown of HAVING
> clauses with grouping sets).

If you check the release notes and the commit message for f5050f795
you'll see that it does mention that wrong results could be returned.

What wasn't mentioned was that this wasn't fixed in prior versions.
The reason being is that the fix required changing the query tree
representation, which we can't change in the back branches due to
incompatibility with stored rules in existing databases. So, a change
in query results for certain queries here is expected.

David



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
Bruce Momjian
Дата:
On Tue, Sep 23, 2025 at 07:38:14PM +1200, David Rowley wrote:
> On Tue, 23 Sept 2025 at 15:49, 邱宇航 <iamqyh@gmail.com> wrote:
> > I've noticed that two GROUP BY ROLLUP queries behave differently in v17
> > compared to master and REL_18_STABLE. The issue can be reproduced by
> > following SQL:
> >
> > After some git bisect work, I traced the root cause:
> > - The first issue was introduced by commit f5050f79 (Mark expressions
> > nullable by grouping sets).
> > - The second issue stems from commit 67a54b9e (Allow pushdown of HAVING
> > clauses with grouping sets).
> 
> If you check the release notes and the commit message for f5050f795
> you'll see that it does mention that wrong results could be returned.
> 
> What wasn't mentioned was that this wasn't fixed in prior versions.
> The reason being is that the fix required changing the query tree
> representation, which we can't change in the back branches due to
> incompatibility with stored rules in existing databases. So, a change
> in query results for certain queries here is expected.

Uh, by design, items mentioned in the major release notes have _not_
been fixed in previous minor versions.  Not sure if we can make that
clearer to users.  I did write a blog about this:

    https://momjian.us/main/blogs/pgblog/2022.html#June_13_2022

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
邱宇航
Дата:
> 2025年9月23日 15:38,David Rowley <dgrowleyml@gmail.com> 写道:
>
> If you check the release notes and the commit message for f5050f795
> you'll see that it does mention that wrong results could be returned.
>
> What wasn't mentioned was that this wasn't fixed in prior versions.
> The reason being is that the fix required changing the query tree
> representation, which we can't change in the back branches due to
> incompatibility with stored rules in existing databases. So, a change
> in query results for certain queries here is expected.

Yeah, I got it. Thanks David and Bruce.

And what about the query 2. This is caused by another commit, and
it's not mentioned in the commit message or the mailing discussion.

Best regards,
Yuhang Qiu




Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
Tom Lane
Дата:
=?utf-8?B?6YKx5a6H6Iiq?= <iamqyh@gmail.com> writes:
> And what about the query 2. This is caused by another commit, and
> it's not mentioned in the commit message or the mailing discussion.

That one indeed seems quite broken.  EXPLAIN confirms that it's
pushing the HAVING below the aggregation, which is simply wrong
because it fails to filter the all-null row(s) that the aggregation
node will create out of thin air.

Is there anything we can salvage from 67a54b9e, or should
we just revert it?

            regards, tom lane



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
David Rowley
Дата:
On Wed, 24 Sept 2025 at 02:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Is there anything we can salvage from 67a54b9e, or should
> we just revert it?

It doesn't seem great that we need to reconsider the safety of that
optimisation post-release. It's not as if 67a54b9e added several cases
to test for and got one of them wrong. It's a case of the 1 case it
did add wasn't fully thought through.

As for fixing it; testing for a Const-false havingClause can't be done
as that only works for this case which const-folding happens to figure
out during planning. You could equally have something Var-less like:

create or replace function one() returns int as $$ begin return 1;
end; $$ stable language plpgsql;
SELECT 'XXX',count(*) FROM t GROUP BY ROLLUP(id) HAVING NOT one()=1;

which isn't known at plan-time. For me, I can't see how to make this
safe and I suspect due to your above question that you're in a similar
situation. Reverting and reconsidering for v19 seems like the safest
option.

Let's see what Richard thinks.

David



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> As for fixing it; testing for a Const-false havingClause can't be done
> as that only works for this case which const-folding happens to figure
> out during planning. You could equally have something Var-less like:

Yeah, I don't think the havingClause being constant-false is the
key point here.  I've not looked closely at 67a54b9e, but I suspect
that anything Var-free is potentially an issue.  Or it might even
fail for something that has Vars but is non-strict.

The core of the problem though is that the aggregation node will
issue an all-nulls row that did not come from its input, and we
have to be sure that the HAVING clause is properly evaluated
for that row.

            regards, tom lane



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
Richard Guo
Дата:
On Wed, Sep 24, 2025 at 10:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, I don't think the havingClause being constant-false is the
> key point here.  I've not looked closely at 67a54b9e, but I suspect
> that anything Var-free is potentially an issue.  Or it might even
> fail for something that has Vars but is non-strict.
>
> The core of the problem though is that the aggregation node will
> issue an all-nulls row that did not come from its input, and we
> have to be sure that the HAVING clause is properly evaluated
> for that row.

I think the issue is that an empty grouping set can produce a row out
of nowhere, and pushing down HAVING clauses in this case may cause us
to fail to filter out that row.

Currently, non-variable-free HAVING clauses aren't pushed down when
there is an empty grouping set, because the empty grouping set would
nullify the vars referenced in those clauses, and we have logic in
place to prevent their pushdown.

However, we (I) overlooked variable-free HAVING clauses.  If there are
only empty grouping sets, this is not a problem since a copy of the
HAVING clauses is retained in HAVING.  The issue arises when there are
both nonempty and empty grouping sets.

In summary, the problem occurs when both of the following conditions
are met: 1) there are both nonempty and empty grouping sets, 2) there
are variable-free HAVING clauses.

I think the following change fixes this problem.

    foreach(l, (List *) parse->havingQual)
    {
        Node       *havingclause = (Node *) lfirst(l);
+       Relids     having_relids;

        if (contain_agg_clause(havingclause) ||
            contain_volatile_functions(havingclause) ||
            contain_subplans(havingclause) ||
            (parse->groupClause && parse->groupingSets &&
-            bms_is_member(root->group_rtindex, pull_varnos(root,
havingclause))))
+            ((having_relids = pull_varnos(root, havingclause)) == NULL ||
+            bms_is_member(root->group_rtindex, having_relids))))
        {
            /* keep it in HAVING */
            newHaving = lappend(newHaving, havingclause);

This change essentially prevents variable-free HAVING clauses from
being pushed down when there are any nonempty grouping sets.  Of
course, this could be done more precisely -- for example, we could
restrict the prevention only to cases where empty grouping sets are
also present, but I'm not sure it's worth the effort.

- Richard



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
Richard Guo
Дата:
On Wed, Sep 24, 2025 at 12:10 PM Richard Guo <guofenglinux@gmail.com> wrote:
> In summary, the problem occurs when both of the following conditions
> are met: 1) there are both nonempty and empty grouping sets, 2) there
> are variable-free HAVING clauses.
>
> I think the following change fixes this problem.
>
>     foreach(l, (List *) parse->havingQual)
>     {
>         Node       *havingclause = (Node *) lfirst(l);
> +       Relids     having_relids;
>
>         if (contain_agg_clause(havingclause) ||
>             contain_volatile_functions(havingclause) ||
>             contain_subplans(havingclause) ||
>             (parse->groupClause && parse->groupingSets &&
> -            bms_is_member(root->group_rtindex, pull_varnos(root,
> havingclause))))
> +            ((having_relids = pull_varnos(root, havingclause)) == NULL ||
> +            bms_is_member(root->group_rtindex, having_relids))))
>         {
>             /* keep it in HAVING */
>             newHaving = lappend(newHaving, havingclause);
>
> This change essentially prevents variable-free HAVING clauses from
> being pushed down when there are any nonempty grouping sets.  Of
> course, this could be done more precisely -- for example, we could
> restrict the prevention only to cases where empty grouping sets are
> also present, but I'm not sure it's worth the effort.

Here is the patch.

- Richard

Вложения

Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
wenhui qiu
Дата:
Hi Richard
 > I think the following change fixes this problem.
>
>     foreach(l, (List *) parse->havingQual)
>     {
>         Node       *havingclause = (Node *) lfirst(l);
> +       Relids     having_relids;
>
>         if (contain_agg_clause(havingclause) ||
>             contain_volatile_functions(havingclause) ||
>             contain_subplans(havingclause) ||
>             (parse->groupClause && parse->groupingSets &&
> -            bms_is_member(root->group_rtindex, pull_varnos(root,
> havingclause))))
> +            ((having_relids = pull_varnos(root, havingclause)) == NULL ||
> +            bms_is_member(root->group_rtindex, having_relids))))
>         {
>             /* keep it in HAVING */
>             newHaving = lappend(newHaving, havingclause);
>
> This change essentially prevents variable-free HAVING clauses from
> being pushed down when there are any nonempty grouping sets.  Of
> course, this could be done more precisely -- for example, we could
> restrict the prevention only to cases where empty grouping sets are
> also present, but I'm not sure it's worth the effort.
I think it makes sense. However, it is now too close to the release date of v18.Awaiting Tom's feedback?


Thanks 

On Wed, Sep 24, 2025 at 4:18 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Sep 24, 2025 at 12:10 PM Richard Guo <guofenglinux@gmail.com> wrote:
> In summary, the problem occurs when both of the following conditions
> are met: 1) there are both nonempty and empty grouping sets, 2) there
> are variable-free HAVING clauses.
>
> I think the following change fixes this problem.
>
>     foreach(l, (List *) parse->havingQual)
>     {
>         Node       *havingclause = (Node *) lfirst(l);
> +       Relids     having_relids;
>
>         if (contain_agg_clause(havingclause) ||
>             contain_volatile_functions(havingclause) ||
>             contain_subplans(havingclause) ||
>             (parse->groupClause && parse->groupingSets &&
> -            bms_is_member(root->group_rtindex, pull_varnos(root,
> havingclause))))
> +            ((having_relids = pull_varnos(root, havingclause)) == NULL ||
> +            bms_is_member(root->group_rtindex, having_relids))))
>         {
>             /* keep it in HAVING */
>             newHaving = lappend(newHaving, havingclause);
>
> This change essentially prevents variable-free HAVING clauses from
> being pushed down when there are any nonempty grouping sets.  Of
> course, this could be done more precisely -- for example, we could
> restrict the prevention only to cases where empty grouping sets are
> also present, but I'm not sure it's worth the effort.

Here is the patch.

- Richard

Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
Richard Guo
Дата:
On Wed, Sep 24, 2025 at 5:18 PM Richard Guo <guofenglinux@gmail.com> wrote:
> Here is the patch.

I plan to push this patch soon, unless there are any objections.

- Richard



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
David Rowley
Дата:
On Thu, 25 Sept 2025 at 13:01, Richard Guo <guofenglinux@gmail.com> wrote:
> I plan to push this patch soon, unless there are any objections.

What's your confidence levels on the logic now being correct? 100%?
90%? Hopeful?

I've not personally had the time to process the patch and figure out
that this is now safe. It sounds like you have, but (with respect) I
expect you also thought that before the initial commit too. I realise
that you now have more of the picture, but how do we know the picture
is now complete? I think we all know this stuff is hard and we also
know that even the most seasoned of planner hackers don't always get
it right first time.

What I'm now wondering is the risk to reward ratio of fixing this in
18.1. If it happens that it's still not right for 18.1, then we need
to wait until 18.2, which is currently due Feb-26. I don't quite have
the same confidence levels as you do, but maybe I would if I took the
time to more carefully think about this.  For now, my thoughts are
that it's safer to revert and try again for v19.  Probably it would
make more sense to try harder for an 18.1 fix if this optimisation was
a more critical and people had been waiting on it and there was no
other way to obtain the benefits of it. But that's not quite the case
here as, in theory, someone could rewrite their query if it's safe for
their use case and end up with the same performance benefits.

Just so it's clear, I'm not objecting to fixing for 18.1. I just want
to ensure your judgment is for the project and not for
self-preservation.  I think most people will respect the decision if
you decide that you need more time to consider this and opt to revert
in v18 and fix only in master. Based on my current understanding of
the optimisation, I'd certainly be more at ease with that.

On the other hand, if you're 100% confident, I'll step aside.

David



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Thu, 25 Sept 2025 at 13:01, Richard Guo <guofenglinux@gmail.com> wrote:
>> I plan to push this patch soon, unless there are any objections.

> What's your confidence levels on the logic now being correct? 100%?
> 90%? Hopeful?

FWIW, my confidence in it is rather low.  I've not had time to think
this through carefully, but it seems to me that the test ought to
involve whether there is an empty grouping set, yet the proposed
patch does no such thing --- or at least, if it manages to achieve
that effect, it's not obvious how.

18.1 will not be coming out till November, so I feel no need to
rush to judgment on what to do here.

            regards, tom lane



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
Richard Guo
Дата:
On Thu, Sep 25, 2025 at 11:05 AM David Rowley <dgrowleyml@gmail.com> wrote:
> What's your confidence levels on the logic now being correct? 100%?
> 90%? Hopeful?
>
> I've not personally had the time to process the patch and figure out
> that this is now safe. It sounds like you have, but (with respect) I
> expect you also thought that before the initial commit too. I realise
> that you now have more of the picture, but how do we know the picture
> is now complete? I think we all know this stuff is hard and we also
> know that even the most seasoned of planner hackers don't always get
> it right first time.
>
> What I'm now wondering is the risk to reward ratio of fixing this in
> 18.1. If it happens that it's still not right for 18.1, then we need
> to wait until 18.2, which is currently due Feb-26. I don't quite have
> the same confidence levels as you do, but maybe I would if I took the
> time to more carefully think about this.  For now, my thoughts are
> that it's safer to revert and try again for v19.  Probably it would
> make more sense to try harder for an 18.1 fix if this optimisation was
> a more critical and people had been waiting on it and there was no
> other way to obtain the benefits of it. But that's not quite the case
> here as, in theory, someone could rewrite their query if it's safe for
> their use case and end up with the same performance benefits.
>
> Just so it's clear, I'm not objecting to fixing for 18.1. I just want
> to ensure your judgment is for the project and not for
> self-preservation.  I think most people will respect the decision if
> you decide that you need more time to consider this and opt to revert
> in v18 and fix only in master. Based on my current understanding of
> the optimisation, I'd certainly be more at ease with that.
>
> On the other hand, if you're 100% confident, I'll step aside.

Thanks for the input; that's a reasonable concern.  Although I've
reviewed my analysis again and didn't find any new issues, I don't
think I would claim to be 100% confident that the current logic is
bug-free.  Realistically, I doubt anyone would make such a claim.

I'm completely open to reverting this and revisiting it for v19.
However, I'd really appreciate any reviews that point out specific
issues in the current logic.  I've shared my analysis in my initial
email as well as in the commit message of the proposed patch.  If
anyone can highlight parts that don't make sense, we can discuss them
and update the patch accordingly.  Without that kind of feedback, I'm
concerned that we may just end up repeating the same code in v19.

- Richard



Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master

От
Richard Guo
Дата:
On Thu, Sep 25, 2025 at 12:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, my confidence in it is rather low.  I've not had time to think
> this through carefully, but it seems to me that the test ought to
> involve whether there is an empty grouping set, yet the proposed
> patch does no such thing --- or at least, if it manages to achieve
> that effect, it's not obvious how.

I explained the test for empty grouping sets in my initial email as
well as in the commit message of the proposed patch.  Apparently, I
failed to make myself clear, so here's another attempt.

To be clear, we are only discussing HAVING clauses that do not contain
aggregates, since clauses with aggregates wouldn't be pushed down
anyway.

First, let's consider the presence of empty grouping sets.  There are
two cases: either the query contains only empty grouping sets, or it
contains both non-empty and empty grouping sets.

In the former case (only empty grouping sets), the HAVING clauses must
be degenerate (variable-free).  These are placed in the WHERE clause
and also retained in HAVING.  This follows the same logic as prior to
commit 67a54b9e8.

In the latter case (both non-empty and empty grouping sets), for
non-degenerate HAVING clauses, the empty grouping sets would nullify
the vars referenced in the clauses, so they would not be pushed
down.  This is ensured by the check bms_is_member(root->group_rtindex,
having_relids).  For degenerate HAVING clauses, they are also
not pushed down, guaranteed by the new check added in the proposed
patch: having_relids == NULL.

Next, suppose there are no empty grouping sets in the query.  For
non-degenerate HAVING clauses, only those clauses that do not
reference columns nullable by the grouping sets are pushed down, which
is guaranteed by the same check: bms_is_member(root->group_rtindex,
having_relids).  For degenerate HAVING clauses, the check
having_relids == NULL prevents pushing them down, though it could be
argued that in this case, they could be pushed down.  This is what I
meant by (quoting from the commit message):

"
This could be done more precisely, for example by restricting the
prevention only to cases where empty grouping sets are also present,
but the added complexity does not seem justified.
"

In summary, under the current logic, HAVING clauses in queries with
grouping sets are pushed down only in the following two cases:

1) There are only empty grouping sets.

In this case, the HAVING clauses are pushed down, but they are also
retained in HAVING.

2) There are non-empty grouping sets and the HAVING clauses are
non-degenerate and they do not reference any columns nullable by the
grouping sets.

I believe both of the above cases are safe for pushing down HAVING
clauses.  It could be argued that there may be more cases where we can
push down HAVING clauses with grouping sets, but for now, I'd prefer
to focusing on ensuring the current logic is safe.

Does this analysis make sense to you?  Am I missing anything?

> 18.1 will not be coming out till November, so I feel no need to
> rush to judgment on what to do here.

Thanks.  I'll wait for any feedback on the patch itself before
deciding how to proceed.  I'd appreciate it if a review could be done
before November.

- Richard