Обсуждение: Wrong results with grouping sets

Поиск
Список
Период
Сортировка

Wrong results with grouping sets

От
Richard Guo
Дата:
I think I've come across a wrong result issue with grouping sets, as
shown by the query below.

-- result is correct with only grouping sets
select a, b
from (values (1, 1), (2, 2)) as t (a, b) where a = b
group by grouping sets((a, b), (a));
 a | b
---+---
 1 | 1
 1 |
 2 | 2
 2 |
(4 rows)

-- result is NOT correct with grouping sets and distinct on
select distinct on (a, b) a, b
from (values (1, 1), (2, 2)) as t (a, b) where a = b
group by grouping sets((a, b), (a));
 a | b
---+---
 1 | 1
 2 | 2
(2 rows)

The distinct on expressions include both 'a' and 'b', so rows (1, 1) and
(1, NULL) should not be considered equal.  (The same for rows (2, 2) and
(2, NULL).)

I think the root cause is that when we generate distinct_pathkeys, we
failed to realize that Var 'b' might be nullable by the grouping sets,
so it's no longer always equal to Var 'a'.  It's not correct to deem
that the PathKey for 'b' is redundant and thus remove it from the
pathkeys list.

We have the same issue when generating sort_pathkeys.  As a result, we
may have the final output in the wrong order.  There were several
reports about this issue before, such as [1][2].

To fix this issue, I'm thinking that we mark the grouping expressions
nullable by grouping sets with a dummy RTE for grouping sets, something
like attached.  In practice we do not need to create a real RTE for
that, what we need is just a RT index.  In the patch I use 0, because
it's not a valid outer join relid, so it would not conflict with the
outer-join-aware-Var infrastructure.

If the grouping expression is a Var or PHV, we can just set its
nullingrels, very straightforward.   For an expression that is neither a
Var nor a PHV, I'm not quite sure how to set the nullingrels.  I tried
the idea of wrapping it in a new PHV to carry the nullingrels, but that
may cause some unnecessary plan diffs.  In the patch for such an
expression I just set the nullingrels of Vars or PHVs that are contained
in it.  This is not really 'correct' in theory, because it is the whole
expression that can be nullable by grouping sets, not its individual
vars.  But it works in practice, because what we need is that the
expression can be somehow distinguished from the same expression in ECs,
and marking its vars is sufficient for this purpose.  But what if the
expression is variable-free?  This is the point that needs more work.
Fow now the patch just handles variable-free expressions of type Const,
by wrapping it in a new PHV.

There are some places where we need to artificially remove the RT index
of grouping sets from the nullingrels, such as when we generate
PathTarget for initial input to grouping nodes, or when we generate
PathKeys for the grouping clauses, because the expressions there are
logically below the grouping sets.  We also need to do that in
set_upper_references when we update the targetlist of an Agg node, so
that we can perform exact match for nullingrels, rather than superset
match.

Since the fix depends on the nullingrels stuff, it seems not easy for
back-patching.  I'm not sure what we should do in back branches.

Any thoughts?

[1] https://www.postgresql.org/message-id/CAMbWs48AtQTQGk37MSyDk_EAgDO3Y0iA_LzvuvGQ2uO_Wh2muw@mail.gmail.com
[2] https://www.postgresql.org/message-id/17071-24dc13fbfa29672d@postgresql.org

Thanks
Richard
Вложения

Re: Wrong results with grouping sets

От
Richard Guo
Дата:

On Mon, Sep 25, 2023 at 3:11 PM Richard Guo <guofenglinux@gmail.com> wrote:
I think the root cause is that when we generate distinct_pathkeys, we
failed to realize that Var 'b' might be nullable by the grouping sets,
so it's no longer always equal to Var 'a'.  It's not correct to deem
that the PathKey for 'b' is redundant and thus remove it from the
pathkeys list.

We have the same issue when generating sort_pathkeys.  As a result, we
may have the final output in the wrong order.  There were several
reports about this issue before, such as [1][2].

To fix this issue, I'm thinking that we mark the grouping expressions
nullable by grouping sets with a dummy RTE for grouping sets, something
like attached. 

Hi Tom, I'm wondering if you've got a chance to look into this issue.
What do you think about the fix?

Thanks
Richard

Re: Wrong results with grouping sets

От
Alena Rybakina
Дата:

Hi! Thank you for your work on the subject.

On 25.09.2023 10:11, Richard Guo wrote:
I think I've come across a wrong result issue with grouping sets, as
shown by the query below.

-- result is correct with only grouping sets
select a, b
from (values (1, 1), (2, 2)) as t (a, b) where a = b
group by grouping sets((a, b), (a));
 a | b
---+---
 1 | 1
 1 |
 2 | 2
 2 |
(4 rows)

-- result is NOT correct with grouping sets and distinct on
select distinct on (a, b) a, b
from (values (1, 1), (2, 2)) as t (a, b) where a = b
group by grouping sets((a, b), (a));
 a | b
---+---
 1 | 1
 2 | 2
(2 rows)

The distinct on expressions include both 'a' and 'b', so rows (1, 1) and
(1, NULL) should not be considered equal.  (The same for rows (2, 2) and
(2, NULL).)

I noticed that this query worked correctly in the main branch with the inequality operator:

postgres=# select distinct on (a, b) a, b from (values (3, 1), (2, 2)) as t (a, b) where a > b group by grouping sets((a, b), (a)); a | b ---+--- 3 | 1 3 | (2 rows)

So, I think you are right)


I think the root cause is that when we generate distinct_pathkeys, we
failed to realize that Var 'b' might be nullable by the grouping sets,
so it's no longer always equal to Var 'a'.  It's not correct to deem
that the PathKey for 'b' is redundant and thus remove it from the
pathkeys list.

We have the same issue when generating sort_pathkeys.  As a result, we
may have the final output in the wrong order.  There were several
reports about this issue before, such as [1][2].

To fix this issue, I'm thinking that we mark the grouping expressions
nullable by grouping sets with a dummy RTE for grouping sets, something
like attached.  In practice we do not need to create a real RTE for
that, what we need is just a RT index.  In the patch I use 0, because
it's not a valid outer join relid, so it would not conflict with the
outer-join-aware-Var infrastructure.

If the grouping expression is a Var or PHV, we can just set its
nullingrels, very straightforward.   For an expression that is neither a
Var nor a PHV, I'm not quite sure how to set the nullingrels.  I tried
the idea of wrapping it in a new PHV to carry the nullingrels, but that
may cause some unnecessary plan diffs.  In the patch for such an
expression I just set the nullingrels of Vars or PHVs that are contained
in it.  This is not really 'correct' in theory, because it is the whole
expression that can be nullable by grouping sets, not its individual
vars.  But it works in practice, because what we need is that the
expression can be somehow distinguished from the same expression in ECs,
and marking its vars is sufficient for this purpose.  But what if the
expression is variable-free?  This is the point that needs more work.
Fow now the patch just handles variable-free expressions of type Const,
by wrapping it in a new PHV.

There are some places where we need to artificially remove the RT index
of grouping sets from the nullingrels, such as when we generate
PathTarget for initial input to grouping nodes, or when we generate
PathKeys for the grouping clauses, because the expressions there are
logically below the grouping sets.  We also need to do that in
set_upper_references when we update the targetlist of an Agg node, so
that we can perform exact match for nullingrels, rather than superset
match.

Since the fix depends on the nullingrels stuff, it seems not easy for
back-patching.  I'm not sure what we should do in back branches.

Any thoughts?

[1] https://www.postgresql.org/message-id/CAMbWs48AtQTQGk37MSyDk_EAgDO3Y0iA_LzvuvGQ2uO_Wh2muw@mail.gmail.com
[2] https://www.postgresql.org/message-id/17071-24dc13fbfa29672d@postgresql.org

I looked at your patch and noticed a few things:

1. I think you should add a test with the cube operator, because I noticed that the order of the query in the result has also changed:

master:
postgres=# select a,b from (values(1,1),(2,2)) as t (a,b) where a=b group by cube (a, (a,b)) order by b,a; a | b ---+--- 1 | 1 1 | 1 1 | 2 | 2 2 | 2 2 | | (7 rows)

with patch:

postgres=# select a, b from (values (1, 1), (2, 2)) as t (a, b) where a = b group by cube(a, (a, b)) order by b,a; a | b ---+--- 1 | 1 1 | 1 2 | 2 2 | 2 1 | 2 | | (7 rows)

-- 
Regards,
Alena Rybakina

Re: Wrong results with grouping sets

От
Richard Guo
Дата:

On Thu, Nov 16, 2023 at 11:25 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:

I noticed that this query worked correctly in the main branch with the inequality operator:

postgres=# select distinct on (a, b) a, b from (values (3, 1), (2, 2)) as t (a, b) where a > b group by grouping sets((a, b), (a)); a | b ---+--- 3 | 1 3 | (2 rows)

So, I think you are right)


Thanks for taking an interest in this patch and verifying it.
 

I looked at your patch and noticed a few things:

1. I think you should add a test with the cube operator, because I noticed that the order of the query in the result has also changed:

 
Hmm, I'm not sure if that's necessary.  The wrong result order you saw
here is caused by the same reason explained above: the planner fails to
realize that Var 'a' and 'b' are nullable by the grouping sets, making
them no longer always equal to each other.  This issue should have been
covered in the tests added by v1 patch.

Thanks
Richard

Re: Wrong results with grouping sets

От
Richard Guo
Дата:

On Mon, Sep 25, 2023 at 3:11 PM Richard Guo <guofenglinux@gmail.com> wrote:
If the grouping expression is a Var or PHV, we can just set its
nullingrels, very straightforward.   For an expression that is neither a
Var nor a PHV, I'm not quite sure how to set the nullingrels.  I tried
the idea of wrapping it in a new PHV to carry the nullingrels, but that
may cause some unnecessary plan diffs.  In the patch for such an
expression I just set the nullingrels of Vars or PHVs that are contained
in it.  This is not really 'correct' in theory, because it is the whole
expression that can be nullable by grouping sets, not its individual
vars.  But it works in practice, because what we need is that the
expression can be somehow distinguished from the same expression in ECs,
and marking its vars is sufficient for this purpose.  But what if the
expression is variable-free?  This is the point that needs more work.
Fow now the patch just handles variable-free expressions of type Const,
by wrapping it in a new PHV.

For a variable-free expression, if it contains volatile functions, SRFs,
aggregates, or window functions, it would not be treated as a member of
EC that is redundant (see get_eclass_for_sort_expr()).  That means it
would not be removed from the pathkeys list, so we do not need to set
the nullingrels for it.  Otherwise we can just wrap the expression in a
PlaceHolderVar.  Attached is an updated patch to do that.

BTW, this wrong results issue has existed ever since grouping sets was
introduced in v9.5, and there were field reports complaining about this
issue.  I think it would be great if we can get rid of it.  I'm still
not sure what we should do in back branches.  But let's fix it at least
on v16 and later.

Thanks
Richard
Вложения

Re: Wrong results with grouping sets

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> For a variable-free expression, if it contains volatile functions, SRFs,
> aggregates, or window functions, it would not be treated as a member of
> EC that is redundant (see get_eclass_for_sort_expr()).  That means it
> would not be removed from the pathkeys list, so we do not need to set
> the nullingrels for it.  Otherwise we can just wrap the expression in a
> PlaceHolderVar.  Attached is an updated patch to do that.

I don't think this is going in quite the right direction.  We have
many serious problems with grouping sets (latest one today at [1]),
and I don't believe that hacking around EquivalenceClasses is going
to fix them all.

I think that what we really need to do is invent a new kind of RTE
representing the output of the grouping step, with columns that
are the Vars or expressions being grouped on.  Then we would make
the parser actually replace subexpressions in the tlist with Vars
referencing this new RTE (that is, change check_ungrouped_columns
into something that modifies the expression tree into something that
contains no Vars that aren't grouping-RTE Vars).  In this way the
output of the parser directly expresses the semantic requirement that
certain subexpressions be gotten from the grouping output rather than
computed some other way.

The trick is to do this without losing optimization capability.
We could have the planner replace these Vars with the underlying
Vars in cases where it's safe to do so (perhaps after adding a
nullingrel bit that references the grouping RTE).  If a grouping
column is an expression, we might be able to replace the reference
Vars with PHVs as you've done here ... but I think we need the
parser infrastructure fixed first.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAEzk6fcgXWabEG%2BRFDaG6tDmFX6g1h7LPGUdrX85Pb0XB3B76g%40mail.gmail.com



Re: Wrong results with grouping sets

От
vignesh C
Дата:
On Thu, 7 Dec 2023 at 13:52, Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> On Mon, Sep 25, 2023 at 3:11 PM Richard Guo <guofenglinux@gmail.com> wrote:
>>
>> If the grouping expression is a Var or PHV, we can just set its
>> nullingrels, very straightforward.   For an expression that is neither a
>> Var nor a PHV, I'm not quite sure how to set the nullingrels.  I tried
>> the idea of wrapping it in a new PHV to carry the nullingrels, but that
>> may cause some unnecessary plan diffs.  In the patch for such an
>> expression I just set the nullingrels of Vars or PHVs that are contained
>> in it.  This is not really 'correct' in theory, because it is the whole
>> expression that can be nullable by grouping sets, not its individual
>> vars.  But it works in practice, because what we need is that the
>> expression can be somehow distinguished from the same expression in ECs,
>> and marking its vars is sufficient for this purpose.  But what if the
>> expression is variable-free?  This is the point that needs more work.
>> Fow now the patch just handles variable-free expressions of type Const,
>> by wrapping it in a new PHV.
>
>
> For a variable-free expression, if it contains volatile functions, SRFs,
> aggregates, or window functions, it would not be treated as a member of
> EC that is redundant (see get_eclass_for_sort_expr()).  That means it
> would not be removed from the pathkeys list, so we do not need to set
> the nullingrels for it.  Otherwise we can just wrap the expression in a
> PlaceHolderVar.  Attached is an updated patch to do that.
>
> BTW, this wrong results issue has existed ever since grouping sets was
> introduced in v9.5, and there were field reports complaining about this
> issue.  I think it would be great if we can get rid of it.  I'm still
> not sure what we should do in back branches.  But let's fix it at least
> on v16 and later.

I have changed the status of the patch to "Waiting on Author" as Tom
Lane's comments have not yet been addressed, feel free to address them
and update the commitfest entry accordingly.

Regards,
Vignesh



Re: Wrong results with grouping sets

От
"Andrey M. Borodin"
Дата:

> On 11 Jan 2024, at 20:10, vignesh C <vignesh21@gmail.com> wrote:
>
> I have changed the status of the patch to "Waiting on Author" as Tom
> Lane's comments have not yet been addressed, feel free to address them
> and update the commitfest entry accordingly.

This CF entry seems to be a fix for actually unexpected behaviour. But seems like we need another fix.
Richard, Alena, what do you think? Should we mark CF entry [0] "RwF" or leave it to wait for better fix?

Best regards, Andrey Borodin.


[0] https://commitfest.postgresql.org/47/4583/


Re: Wrong results with grouping sets

От
Richard Guo
Дата:

On Sun, Jan 7, 2024 at 4:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't think this is going in quite the right direction.  We have
many serious problems with grouping sets (latest one today at [1]),
and I don't believe that hacking around EquivalenceClasses is going
to fix them all.

I think that what we really need to do is invent a new kind of RTE
representing the output of the grouping step, with columns that
are the Vars or expressions being grouped on.  Then we would make
the parser actually replace subexpressions in the tlist with Vars
referencing this new RTE (that is, change check_ungrouped_columns
into something that modifies the expression tree into something that
contains no Vars that aren't grouping-RTE Vars).  In this way the
output of the parser directly expresses the semantic requirement that
certain subexpressions be gotten from the grouping output rather than
computed some other way.

The trick is to do this without losing optimization capability.
We could have the planner replace these Vars with the underlying
Vars in cases where it's safe to do so (perhaps after adding a
nullingrel bit that references the grouping RTE).  If a grouping
column is an expression, we might be able to replace the reference
Vars with PHVs as you've done here ... but I think we need the
parser infrastructure fixed first.

Sorry it takes me some time to get back to this thread.

I think you're right.  To fix the cases where there are subqueries in
the grouping sets, as in Geoff's example, it seems that we'll have to
fix the parser infrastructure by inventing a new RTE for the grouping
step and replacing the subquery expressions with Vars referencing this
new RTE, so that there is only one instance of the subquery in the
parser output.

I have experimented with this approach, and here is the outcome.  The
patch fixes Geoff's query, but it's still somewhat messy as I'm not
experienced enough in the parser code.  And the patch has not yet
implemented the nullingrel bit manipulation trick.  Before proceeding
further with this patch, I'd like to know if it is going in the right
direction.

Thanks
Richard
Вложения

Re: Wrong results with grouping sets

От
Richard Guo
Дата:

On Thu, May 16, 2024 at 5:43 PM Richard Guo <guofenglinux@gmail.com> wrote:
I have experimented with this approach, and here is the outcome.  The
patch fixes Geoff's query, but it's still somewhat messy as I'm not
experienced enough in the parser code.  And the patch has not yet
implemented the nullingrel bit manipulation trick.  Before proceeding
further with this patch, I'd like to know if it is going in the right
direction.

I've spent some more time on this patch, and now it passes all the
regression tests.  But I had to hack explain.c and ruleutils.c to make
the varprefix stuff work as it did before, which is not great.

One thing I'm not sure about is whether we need to also replace
subexpressions in the arguments of GroupingFunc nodes with Vars
referencing the new GROUP RTE.  These arguments would not be executed at
runtime, so it seems that we can just replace them.  I tried to do that
and found several plan changes in the regression tests.  Such as

explain (verbose, costs off)
select grouping(ss.x)
from int8_tbl i1
cross join lateral (select (select i1.q1) as x) ss
group by ss.x;
                   QUERY PLAN
------------------------------------------------
 GroupAggregate
   Output: GROUPING((SubPlan 1)), ((SubPlan 2))
   Group Key: ((SubPlan 2))
   ->  Sort
         Output: ((SubPlan 2)), i1.q1
         Sort Key: ((SubPlan 2))
         ->  Seq Scan on public.int8_tbl i1
               Output: (SubPlan 2), i1.q1
               SubPlan 2
                 ->  Result
                       Output: i1.q1
(11 rows)

If we substitute the subquery expression in the argument of GroupingFunc
with the GROUP RTE's Var, the final plan would contain only one SubPlan
instead of two.

Also the patch has not yet manipulated the nullingrel stuff.  Maybe that
can be done with the code in my v2 patch.  But I think we'd better get
the parser fixed first before stepping into that.

Also please ignore the comment and code format things in the patch as I
haven't worked on them.

Thanks
Richard
Вложения

Re: Wrong results with grouping sets

От
Richard Guo
Дата:

On Fri, May 17, 2024 at 5:41 PM Richard Guo <guofenglinux@gmail.com> wrote:
I've spent some more time on this patch, and now it passes all the
regression tests.  But I had to hack explain.c and ruleutils.c to make
the varprefix stuff work as it did before, which is not great.

I've realized that I made a mistake in the v4 patch: If there are join
alias vars in the targetlist and HAVING clause, we should first flatten
them before replacing the grouped variables involved there with
grouping-RTE Vars.  To fix this issue, I decide to merge the newly added
function substitute_group_exprs into check_ungrouped_columns by changing
check_ungrouped_columns to also perform the replacement, which is Tom's
initial suggestion I think.

Now it seems that 'check_ungrouped_columns' is no longer an appropriate
name for the function.  So I rename it to 'substitute_grouped_columns'.
But I'm open to other names if there are any suggestions.

I've also worked on the comments.

Thanks
Richard
Вложения

Re: Wrong results with grouping sets

От
Richard Guo
Дата:
On the basis of the parser infrastructure fixup, 0002 patch adds the
nullingrel bit that references the grouping RTE to the grouping
expressions.

However, it seems to me that we have to manually remove this nullingrel
bit from expressions in various cases where these expressions are
logically below the grouping step, such as when we generate groupClause
pathkeys for grouping sets, or when we generate PathTarget for initial
input to grouping nodes.

Furthermore, in set_upper_references, the targetlist and quals of an Agg
node should have nullingrels that include the effects of the grouping
step, ie they will have nullingrels equal to the input Vars/PHVs'
nullingrels plus the nullingrel bit that references the grouping RTE.
In order to perform exact nullingrels matches, I think we also need to
manually remove this nullingrel bit.

Thanks
Richard
Вложения