Обсуждение: Requesting advanced Group By support

Поиск
Список
Период
Сортировка

Requesting advanced Group By support

От
Arun Kumar
Дата:
Hi,
      From MySQL 5.7, It supports SQL standard 99 and implements the feature such functional dependent on the GROUP By columns, i.e., it detects the non-aggregate columns which are functionally dependent on the GROUP BY columns (not included in GROUP BY) and then executes the query without error.
For example,

SELECT a.sno,b.sno,a.name,b.location FROM Name AS a JOIN Location AS b ON a.sno=b.sno GROUP BY a.sno,b.location

In this case, a.sno is a primary key so no need to include a.name in GROUP By as it would be identified by the primary key and then for b.sno which is again equated with a.sno (primary key) so no need to add this as well but for b.location, we need to add it in GROUP BY or we should use any aggregate function over this column to avoid error. For more info, please check on the below link https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
Is there any plans on supporting this in Postgres in future versions ?


Thank You,
Arun Kumar

Re: Requesting advanced Group By support

От
Tomas Vondra
Дата:
Hi,

On 10/09/2018 03:10 PM, Arun Kumar wrote:
> Hi,
>  From MySQL 5.7, It supports SQL standard 99 and implements the feature 
> such functional dependent on the GROUP By columns, i.e., it detects the 
> non-aggregate columns which are functionally dependent on the GROUP BY 
> columns (not included in GROUP BY) and then executes the query without 
> error.
> For example,
> 
> *SELECT a.sno,b.sno,a.name,b.location FROM Name AS a JOIN Location AS b 
> ON a.sno=b.sno GROUP BY a.sno,b.location *
> 
> In this case, a.sno is a primary key so no need to include a.name in 
> GROUP By as it would be identified by the primary key and then for b.sno 
> which is again equated with a.sno (primary key) so no need to add this 
> as well but for b.location, we need to add it in GROUP BY or we should 
> use any aggregate function over this column to avoid error. For more 
> info, please check on the below link 
> https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
> Is there any plans on supporting this in Postgres in future versions ?
> 

So, which part of this supposedly does not work in PostgreSQL?

Consider this:

test2=# create table t (id int primary key, b int, c int, d int);
CREATE TABLE
test2=# explain select * from t group by id, b, c;
                          QUERY PLAN
------------------------------------------------------------
  HashAggregate  (cost=33.12..51.62 rows=1850 width=16)
    Group Key: id
    ->  Seq Scan on t  (cost=0.00..28.50 rows=1850 width=16)
(3 rows)

test2=# explain select id, count(*) from t group by id, b, c;
                          QUERY PLAN
------------------------------------------------------------
  HashAggregate  (cost=37.75..56.25 rows=1850 width=20)
    Group Key: id
    ->  Seq Scan on t  (cost=0.00..28.50 rows=1850 width=12)
(3 rows)

So clearly we've already eliminated the functionally-dependent columns 
from the aggregation.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Requesting advanced Group By support

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 10/09/2018 03:10 PM, Arun Kumar wrote:
>> *SELECT a.sno,b.sno,a.name,b.location FROM Name AS a JOIN Location AS b 
>> ON a.sno=b.sno GROUP BY a.sno,b.location *
>> 
>> In this case, a.sno is a primary key so no need to include a.name in 
>> GROUP By as it would be identified by the primary key and then for b.sno 
>> which is again equated with a.sno (primary key) so no need to add this 
>> as well but for b.location, we need to add it in GROUP BY or we should 
>> use any aggregate function over this column to avoid error.

> So, which part of this supposedly does not work in PostgreSQL?

The part where it infers that b.sno is unique based solely on it having
been equated to a.sno.

I'm not sure whether the SQL spec's definition of functional dependencies
includes such a proof rule, but I'm not very excited about adding one to
PG.  It's likely of limited use, seeing that this is the first time I can
recall anyone asking for it; and it'd create dependency problems that we
don't have today, because validity of the query would depend on the
existence of a btree operator class from which we could infer that
uniqueness of a.sno implies uniqueness of b.sno.  We have enough problems
arising from the existing case of validity of the query depending on the
existence of a primary key.  Also, a primary key is at least a
well-defined dependency (there can be only one); but since an equality
operator could belong to multiple opclasses, it's not very clear which
one the query would get marked as depending on.

In short: the cost/benefit ratio of this optimization looks pretty bad.

            regards, tom lane


Re: Requesting advanced Group By support

От
Tom Lane
Дата:
I wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> So, which part of this supposedly does not work in PostgreSQL?

> The part where it infers that b.sno is unique based solely on it having
> been equated to a.sno.

Oh, wait a second: such an inference is actually *wrong* in the general
case, or at least underdetermined.  It fails in cases where the data type
considers distinguishable values to be "equal", as for example zero vs.
minus zero in IEEE floats, or numeric values with varying numbers of
trailing zeroes, or citext, etc.  So for example if the sno columns are
type citext, we can be sure that a.sno does not contain both 'X' and 'x',
because the pkey would forbid it.  But if it contains 'X', while b.sno
contains both 'X' and 'x', then (if we allowed this case) it'd be
indeterminate which b.sno value is returned by the GROUP BY.  One might or
might not consider that OK for a particular application, but I don't think
the parser should just assume for you that it is.

            regards, tom lane


Sv: Re: Requesting advanced Group By support

От
Andreas Joseph Krogh
Дата:
På onsdag 10. oktober 2018 kl. 18:46:15, skrev Tomas Vondra <tomas.vondra@2ndquadrant.com>:
Hi,

On 10/09/2018 03:10 PM, Arun Kumar wrote:
> Hi,
>  From MySQL 5.7, It supports SQL standard 99 and implements the feature
> such functional dependent on the GROUP By columns, i.e., it detects the
> non-aggregate columns which are functionally dependent on the GROUP BY
> columns (not included in GROUP BY) and then executes the query without
> error.
> For example,
>
> *SELECT a.sno,b.sno,a.name,b.location FROM Name AS a JOIN Location AS b
> ON a.sno=b.sno GROUP BY a.sno,b.location *
>
> In this case, a.sno is a primary key so no need to include a.name in
> GROUP By as it would be identified by the primary key and then for b.sno
> which is again equated with a.sno (primary key) so no need to add this
> as well but for b.location, we need to add it in GROUP BY or we should
> use any aggregate function over this column to avoid error. For more
> info, please check on the below link
> https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
> Is there any plans on supporting this in Postgres in future versions ?
>

So, which part of this supposedly does not work in PostgreSQL?

Consider this:

test2=# create table t (id int primary key, b int, c int, d int);
CREATE TABLE
test2=# explain select * from t group by id, b, c;
                          QUERY PLAN
------------------------------------------------------------
  HashAggregate  (cost=33.12..51.62 rows=1850 width=16)
    Group Key: id
    ->  Seq Scan on t  (cost=0.00..28.50 rows=1850 width=16)
(3 rows)

test2=# explain select id, count(*) from t group by id, b, c;
                          QUERY PLAN
------------------------------------------------------------
  HashAggregate  (cost=37.75..56.25 rows=1850 width=20)
    Group Key: id
    ->  Seq Scan on t  (cost=0.00..28.50 rows=1850 width=12)
(3 rows)

So clearly we've already eliminated the functionally-dependent columns
from the aggregation.

regards
 
Too bad this doesn't:
 
create table t (id int NOT NULL UNIQUE, b int, c int, d int);
 
explain select * from t group by id, b, c;
ERROR:  column "t.d" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: explain select * from t group by id, b, c;

 
 
--
Andreas Joseph Krogh

Re: Requesting advanced Group By support

От
Andres Freund
Дата:

Hi,

On October 10, 2018 10:37:40 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 10/09/2018 03:10 PM, Arun Kumar wrote:
>>> *SELECT a.sno,b.sno,a.name,b.location FROM Name AS a JOIN Location
>AS b
>>> ON a.sno=b.sno GROUP BY a.sno,b.location *
>>>
>>> In this case, a.sno is a primary key so no need to include a.name in
>
>>> GROUP By as it would be identified by the primary key and then for
>b.sno
>>> which is again equated with a.sno (primary key) so no need to add
>this
>>> as well but for b.location, we need to add it in GROUP BY or we
>should
>>> use any aggregate function over this column to avoid error.
>
>> So, which part of this supposedly does not work in PostgreSQL?
>
>The part where it infers that b.sno is unique based solely on it having
>been equated to a.sno.

Isn't the spec compliant thing that's missing dealing with unique not null?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Requesting advanced Group By support

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On October 10, 2018 10:37:40 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>> So, which part of this supposedly does not work in PostgreSQL?

>> The part where it infers that b.sno is unique based solely on it having
>> been equated to a.sno.

> Isn't the spec compliant thing that's missing dealing with unique not null?

IIRC, the spec has a whole bunch of "functional dependency" proof rules,
of which the only one we implement at the moment is the one about the
other columns of a table all being functionally dependent on its pkey.

I don't know if any of the spec's rules are at all close to this one.

            regards, tom lane


Re: Requesting advanced Group By support

От
Robert Haas
Дата:
On Wed, Oct 10, 2018 at 1:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It fails in cases where the data type
> considers distinguishable values to be "equal", as for example zero vs.
> minus zero in IEEE floats, or numeric values with varying numbers of
> trailing zeroes, or citext, etc.  So for example if the sno columns are
> type citext, we can be sure that a.sno does not contain both 'X' and 'x',
> because the pkey would forbid it.  But if it contains 'X', while b.sno
> contains both 'X' and 'x', then (if we allowed this case) it'd be
> indeterminate which b.sno value is returned by the GROUP BY.  One might or
> might not consider that OK for a particular application, but I don't think
> the parser should just assume for you that it is.

Since this is approximately the 437,253rd time this problem has come
up, and since even reasonably experienced hackers often get confused
about it or (ahem) momentarily forget about the problem, it is really
well paste time to find some way of labeling operator classes or
families or individual operators to indicate whether or not they are
testing precisely the exactly-the-same property.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Requesting advanced Group By support

От
Peter Eisentraut
Дата:
On 10/10/2018 19:59, Andres Freund wrote:
> Isn't the spec compliant thing that's missing dealing with unique not null?

I don't think that's what the original poster complained about, but you
are right that it's a known missing part.  The reason it's missing is
that we can't record dependencies on not-null constraints.  Some work on
that has been in progress for many years.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services