Обсуждение: [PATCH] GROUP BY ALL
I see that there'd been some chatter but not a lot of discussion about a GROUP BY ALL feature/functionality. There certainly is utility in such a construct IMHO. The grammar is unambiguous, so can support this construct in lieu of the traditional GROUP BY clause. Enclosed is a patch which adds this via just scanning the TargetEntry list and adding anything that is not an aggregate function call to the groupList. Still need some docs; just throwing this out there and getting some feedback. Thanks, David
Вложения
On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.
Still need some docs; just throwing this out there and getting some feedback.
I strongly dislike adding this feature. I'd only consider supporting it if it was part of the SQL standard.
Code is written once and read many times. This feature caters to the writer, not the reader. And furthermore usage of this is prone to be to the writer's detriment as well.
David J.
On Mon, 22 Jul 2024 at 17:34, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.
Still need some docs; just throwing this out there and getting some feedback.I strongly dislike adding this feature. I'd only consider supporting it if it was part of the SQL standard.Code is written once and read many times. This feature caters to the writer, not the reader. And furthermore usage of this is prone to be to the writer's detriment as well.
And for when this might be useful, the syntax for it already exists, although a spurious error message is generated:
odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
^
ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
^
I'm not sure exactly what's going on here — it's like it's still seeing the table name in the field list as only a table name and not the value corresponding to the whole table as a row value (But in general I'm not happy with the system's ability to figure out that a column's value has only one possibility given the grouping columns). You can work around:
odyssey=> with t as (select uw_term, count(*) from uw_term group by uw_term) select (uw_term).*, count from t;
This query works.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote: >> I see that there'd been some chatter but not a lot of discussion about >> a GROUP BY ALL feature/functionality. There certainly is utility in >> such a construct IMHO. > I strongly dislike adding this feature. I'd only consider supporting it if > it was part of the SQL standard. Yeah ... my recollection is that we already rejected this idea. If you want to re-litigate that, "throwing this out there" is not a sufficient argument. (Personally, I'd wonder exactly what ALL is quantified over: the whole output of the FROM clause, or only columns mentioned in the SELECT tlist, or what? And why that choice rather than another?) regards, tom lane
Isaac Morland <isaac.morland@gmail.com> writes: > And for when this might be useful, the syntax for it already exists, > although a spurious error message is generated: > odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term; > ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be > used in an aggregate function > LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term; > ^ > I'm not sure exactly what's going on here The SELECT entry is expanded into "uw_term.col1, uw_term.col2, uw_term.col3, ...", and those single-column Vars don't match the whole-row Var appearing in the GROUP BY list. I guess if we think this is important, we could add a proof rule saying that a per-column Var is functionally dependent on a whole-row Var of the same relation. Odd that the point hasn't come up before (though I guess that suggests that few people try this). regards, tom lane
On Mon, Jul 22, 2024 at 4:34 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote: >> >> I see that there'd been some chatter but not a lot of discussion about >> a GROUP BY ALL feature/functionality. There certainly is utility in >> such a construct IMHO. >> >> Still need some docs; just throwing this out there and getting some feedback. >> > > I strongly dislike adding this feature. I'd only consider supporting it if it was part of the SQL standard. > > Code is written once and read many times. This feature caters to the writer, not the reader. And furthermore usage ofthis is prone to be to the writer's detriment as well. I'd say this feature (at least for me) caters to the investigator; someone who is interactively looking at data hence why it would cater to the writer. Consider acquainting yourself with a large table that has a large number of annoying-named fields where you want to look at how different data is correlated or broken-down. Something along the lines of: SELECT last_name, substring(first_name,1,1) as first_initial, income_range, count(*) FROM census_data GROUP BY ALL; If you are iteratively looking at things, adding or removing fields from your breakdown, you only need to change it in a single place, the tlist. Additionally, expressions can be used transparently without needing to repeat them. (Yes, in practice, I'd often use GROUP BY 1, 2, say, but if you add more fields to this you need to edit in multiple places.) David
On Mon, Jul 22, 2024 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "David G. Johnston" <david.g.johnston@gmail.com> writes: > > On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote: > >> I see that there'd been some chatter but not a lot of discussion about > >> a GROUP BY ALL feature/functionality. There certainly is utility in > >> such a construct IMHO. > > > I strongly dislike adding this feature. I'd only consider supporting it if > > it was part of the SQL standard. > > Yeah ... my recollection is that we already rejected this idea. > If you want to re-litigate that, "throwing this out there" is > not a sufficient argument. Heh, fair enough. I actually wrote the patch after encountering the syntax in DuckDB and it seemed easy enough to add to Postgres while providing some utility, then ended up seeing a thread about it later. I did not get the sense that this had been globally rejected; though there were definitely voices against it, it seemed to trail off rather than coming to a conclusion. > (Personally, I'd wonder exactly what ALL is quantified over: the > whole output of the FROM clause, or only columns mentioned in the > SELECT tlist, or what? And why that choice rather than another?) My intention here was to basically be a shorthand for "group by specified non-aggregate fields in the select list". Perhaps I'm not being creative enough, but what is the interpretation/use case for anything else? :-) While there are other ways to accomplish these things, making an easy way to GROUP BY with aggregate queries would be useful in the field, particularly when doing iterative discovery work would save a lot of time with a situation that is both detectable and hits users with errors all the time. I'm not married to the exact syntax of this feature; anything else short and consistent could work if `ALL` is considered to potentially gain a different interpretation in the future. David
On Tue, Jul 23, 2024 at 8:21 AM Andrei Borodin <x4mmm@yandex-team.ru> wrote: > > On 23 Jul 2024, at 00:40, Isaac Morland <isaac.morland@gmail.com> wrote: > > odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term; > ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function > LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term; > > > AFAIR this problem was solved in my implementation [0] > > On 23 Jul 2024, at 01:29, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > (Personally, I'd wonder exactly what ALL is quantified over: the > whole output of the FROM clause, or only columns mentioned in the > SELECT tlist, or what? And why that choice rather than another?) > > > I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora boxof syntax sugar extensions which may will be incompatible with future standards. > If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table SELECT columnand better GROUP BY. GROUP BY AUTO also seems fine here to me; I understand the desire to avoid major incompatible syntax changes; GROUP BY ALL does exist in multiple products so it's not unprecedented. I wrote my patch before seeing your thread, sorry I missed that. :-) David
On Mon, Jul 22, 2024 at 4:41 PM Isaac Morland <isaac.morland@gmail.com> wrote: > And for when this might be useful, the syntax for it already exists, although a spurious error message is generated: > > odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term; > ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function > LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term; > ^ This is with my patch, or existing postgres? Grouping by record is not actually what this patch is trying to do, so perhaps there is some ambiguity; this is intended to GROUP BY any select item that is not an aggregate function.
On Tue, 2024-07-23 at 08:37 -0500, David Christensen wrote: > My intention here was to basically be a shorthand for "group by > specified non-aggregate fields in the select list". Perhaps I'm not > being creative enough, but what is the interpretation/use case for > anything else? :-) I am somewhat against this feature. It is too much magic for my taste. It might be handy for interactive use, but I would frown at an application that uses code like that, much like I'd frown at "SELECT *" in application code. Yours, Laurenz Albe
On Tue, Jul 23, 2024 at 10:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2024-07-23 at 08:37 -0500, David Christensen wrote: > > My intention here was to basically be a shorthand for "group by > > specified non-aggregate fields in the select list". Perhaps I'm not > > being creative enough, but what is the interpretation/use case for > > anything else? :-) > > I am somewhat against this feature. > It is too much magic for my taste. > > It might be handy for interactive use, but I would frown at an application > that uses code like that, much like I'd frown at "SELECT *" in application code. Sure, not everything that makes things easier is strictly necessary; we could require `CAST(field AS text)` instead of `::text`, make subqueries required for transforming oids into specific system tables instead of `::regfoo` casts, any number of other choices, remove `SELECT *` as a parse option, but making it easier to do common things interactively as a DBA has value as well. David
On Tue, Jul 23, 2024 at 9:48 AM David Christensen <david@pgguru.net> wrote:
Sure, not everything that makes things easier is strictly necessary;
we could require `CAST(field AS text)` instead of `::text`,
Probably should have...being standard and all. Though syntactic sugar is quite different from new behavior - transforming :: to CAST is straight-forward.
make
subqueries required for transforming oids into specific system tables
instead of `::regfoo` casts,
Since OID is non-standard this falls within our purview.
any number of other choices, remove
`SELECT *` as a parse option,
Again, standard dictated.
but making it easier to do common things
interactively as a DBA has value as well.
Agreed, but this isn't a clear-cut win, and doesn't have standard conformance to tip the scale over fully.
Also, there are so many better tools for data exploration. Removing this quirk only marginally closes that gap.
David J.
On 7/22/24 15:43, Tom Lane wrote: > Isaac Morland <isaac.morland@gmail.com> writes: >> And for when this might be useful, the syntax for it already exists, >> although a spurious error message is generated: > >> odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term; >> ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be >> used in an aggregate function >> LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term; >> ^ > >> I'm not sure exactly what's going on here > > The SELECT entry is expanded into "uw_term.col1, uw_term.col2, > uw_term.col3, ...", and those single-column Vars don't match the > whole-row Var appearing in the GROUP BY list. I guess if we > think this is important, we could add a proof rule saying that > a per-column Var is functionally dependent on a whole-row Var > of the same relation. Odd that the point hasn't come up before > (though I guess that suggests that few people try this). I was just using this group-by-row feature last week to implement a temporal outer join in a way that would work for arbitrary tables. Here is some example SQL: https://github.com/pjungwir/temporal_ops/blob/b10d65323749faa6c47956db2e8f95441e508fce/sql/outer_join.sql#L48-L66 That does `GROUP BY a` then `SELECT (x.a).*`.[1] It is very useful for writing queries that don't want to know about the structure of the row. I noticed the same error as Isaac. I worked around the problem by wrapping it in a subquery and decomposing the row outside. It's already an obscure feature, and an easy workaround might be why you haven't heard complaints before. I wouldn't mind writing a patch for that rule when I get a chance (if no one else gets to it first.) [1] Actually I see it does `GROUP BY a, a.valid_at`, but that is surely more than I need. I think that `a.valid_at` is leftover from a previous version of the query. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 23.07.24 00:29, Tom Lane wrote: > (Personally, I'd wonder exactly what ALL is quantified over: the > whole output of the FROM clause, or only columns mentioned in the > SELECT tlist, or what? And why that choice rather than another?) Looks like the main existing implementations take it to mean all entries in the SELECT list that are not aggregate functions. https://duckdb.org/docs/sql/query_syntax/groupby.html#group-by-all https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-groupby.html#parameters https://docs.snowflake.com/en/sql-reference/constructs/group-by#parameters
On Mon, 22 Jul 2024 at 22:55, David Christensen <david@pgguru.net> wrote: > I see that there'd been some chatter but not a lot of discussion about > a GROUP BY ALL feature/functionality. There certainly is utility in > such a construct IMHO. +1 from me. When exploring data, this is extremely useful because you don't have to update the GROUP BY clause every time Regarding the arguments against this: 1. I don't think this is any more unreadable than being able to GROUP BY 1, 2, 3. Or being able to use column aliases from the SELECT in the GROUP BY clause. Again this is already allowed. Personally I actually think it's more readable than specifying e.g. 5 columns in the group by, because then I have to cross-reference with columns in the SELECT clause to find out if they are the same. With ALL I instantly know it's grouped by all 2. This is indeed not part of the standard. But we have many things that are not part of the standard. I think as long as we use the same syntax as snowflake, databricks and duckdb I personally don't see a big problem. Then we could try and make this be part of the standard in the next version of the standard.
On Tue, 23 Jul 2024 at 15:22, Andrei Borodin <x4mmm@yandex-team.ru> wrote: > I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora boxof syntax sugar extensions which may will be incompatible with future standards. > If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table SELECT columnand better GROUP BY. Personally my number one enhancement would be allowing a trailing comma after the last column in the SELECT clause.
Hi
st 24. 7. 2024 v 10:57 odesílatel Jelte Fennema-Nio <postgres@jeltef.nl> napsal:
On Mon, 22 Jul 2024 at 22:55, David Christensen <david@pgguru.net> wrote:
> I see that there'd been some chatter but not a lot of discussion about
> a GROUP BY ALL feature/functionality. There certainly is utility in
> such a construct IMHO.
+1 from me. When exploring data, this is extremely useful because you
don't have to update the GROUP BY clause every time
Regarding the arguments against this:
1. I don't think this is any more unreadable than being able to GROUP
BY 1, 2, 3. Or being able to use column aliases from the SELECT in the
GROUP BY clause. Again this is already allowed. Personally I actually
think it's more readable than specifying e.g. 5 columns in the group
by, because then I have to cross-reference with columns in the SELECT
clause to find out if they are the same. With ALL I instantly know
it's grouped by all
2. This is indeed not part of the standard. But we have many things
that are not part of the standard. I think as long as we use the same
syntax as snowflake, databricks and duckdb I personally don't see a
big problem. Then we could try and make this be part of the standard
in the next version of the standard.
Aggregation against more columns is pretty slow and memory expensive in Postgres.
DuckDB is an analytic database with different storage, different executors. I like it very much, but I am not sure if I want to see these features in Postgres.
Lot of developers are not very smart, and with proposed feature, then instead to write correct and effective query, then they use just GROUP BY ALL. Slow query should look like a slow query :-)
Regards
Pavel
> On 24 Jul 2024, at 13:58, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > On Tue, 23 Jul 2024 at 15:22, Andrei Borodin <x4mmm@yandex-team.ru> wrote: >> I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora boxof syntax sugar extensions which may will be incompatible with future standards. >> If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table SELECTcolumn and better GROUP BY. > > Personally my number one enhancement would be allowing a trailing > comma after the last column in the SELECT clause. Yes, trailing comma sounds great too. One more similar syntax sugar I can think of. I see lots of queries like SELECT somtheing FROM table1 WHERE 1=1 and id = x --and col1 = val1 and col2 = val2 I was wondering where does that "1=1" comes from. It's because developer comment condition one by one like "--, col1 = val1".And they do not want to cope with and\or continuation. Best regards, Andrey Borodin. PS. Seems like Mail.App mangled my previous message despite using plain text. It's totally broken in archives...
On Tue, Jul 23, 2024 at 6:53 PM David Christensen <david@pgguru.net> wrote: > > On Mon, Jul 22, 2024 at 4:34 PM David G. Johnston > <david.g.johnston@gmail.com> wrote: > > > > On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote: > >> > >> I see that there'd been some chatter but not a lot of discussion about > >> a GROUP BY ALL feature/functionality. There certainly is utility in > >> such a construct IMHO. > >> > >> Still need some docs; just throwing this out there and getting some feedback. > >> > > > > I strongly dislike adding this feature. I'd only consider supporting it if it was part of the SQL standard. > > > > Code is written once and read many times. This feature caters to the writer, not the reader. And furthermore usageof this is prone to be to the writer's detriment as well. > > I'd say this feature (at least for me) caters to the investigator; > someone who is interactively looking at data hence why it would cater > to the writer. Consider acquainting yourself with a large table that > has a large number of annoying-named fields where you want to look at > how different data is correlated or broken-down. Something along the > lines of: To me this looks like a feature that a data exploration tool may implement instead of being part of the server. It would then provide more statistics about each correlation/column set etc. -- Best Wishes, Ashutosh Bapat
On Tue, Jul 23, 2024 at 9:37 AM David Christensen <david@pgguru.net> wrote:
I'm not married to the exact syntax of this feature; anything else short and consistent could work if `ALL` is considered to potentially
gain a different interpretation in the future.
GROUP BY *
Just kidding. But a big +1 to the whole concept. It would have been extraordinarily useful over the years.
Cheers,
Greg
On Tue, 23 Jul 2024 at 22:02, Peter Eisentraut <peter@eisentraut.org> wrote: > Looks like the main existing implementations take it to mean all entries > in the SELECT list that are not aggregate functions. > > https://duckdb.org/docs/sql/query_syntax/groupby.html#group-by-all > https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-groupby.html#parameters > https://docs.snowflake.com/en/sql-reference/constructs/group-by#parameters Oracle added support for GROUP BY ALL too now: https://danischnider.wordpress.com/2025/08/05/oracle-23-9-supports-group-by-all/
On 17.08.25 19:12, Jelte Fennema-Nio wrote: > On Tue, 23 Jul 2024 at 22:02, Peter Eisentraut <peter@eisentraut.org> wrote: >> Looks like the main existing implementations take it to mean all entries >> in the SELECT list that are not aggregate functions. >> >> https://duckdb.org/docs/sql/query_syntax/groupby.html#group-by-all >> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-groupby.html#parameters >> https://docs.snowflake.com/en/sql-reference/constructs/group-by#parameters > > Oracle added support for GROUP BY ALL too now: > https://danischnider.wordpress.com/2025/08/05/oracle-23-9-supports-group-by-all/ The proposal for GROUP BY ALL was accepted into the SQL standard draft yesterday. So maybe someone wants to take this up again. The initially proposed patch appears to have the right idea overall. But it does not handle more complex cases like SELECT a, SUM(b)+a FROM t1 GROUP BY ALL; correctly. The piece of code that does if (!IsA(n->expr,Aggref)) should be generalized to check for aggregates not only at the top level. (For explanation: GROUP BY ALL expands to all select list entries that do not contain aggregates. So the above would expand to SELECT a, SUM(b)+a FROM t1 GROUP BY a; which should then be rejected based on the existing rules.)
> On 26 Sep 2025, at 12:02, Peter Eisentraut <peter@eisentraut.org> wrote: > > maybe someone wants to take this up again. I've compared my old patch with David's, and his version looks better. If David is not going to work on this in nearest future - I'l like to pick up the work. Either way I'll join as reviewer. Thanks! Best regards, Andrey Borodin.
> On Sep 26, 2025, at 3:45 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > >> On 26 Sep 2025, at 12:02, Peter Eisentraut <peter@eisentraut.org> wrote: >> >> maybe someone wants to take this up again. > > I've compared my old patch with David's, and his version looks better. > If David is not going to work on this in nearest future - I'l like to pick up the work. > Either way I'll join as reviewer. > > Thanks! I’m interested in picking it up again but would appreciate the review. Thanks, David
Peter Eisentraut <peter@eisentraut.org> writes: > The initially proposed patch appears to have the right idea overall. > But it does not handle more complex cases like > SELECT a, SUM(b)+a FROM t1 GROUP BY ALL; > (For explanation: GROUP BY ALL expands to all select list entries that > do not contain aggregates. So the above would expand to > SELECT a, SUM(b)+a FROM t1 GROUP BY a; > which should then be rejected based on the existing rules.) I thought I understood this definition, up till your last comment. What's invalid about that expanded query? regression=# create table t1 (a int, b int); CREATE TABLE regression=# SELECT a, SUM(b)+a FROM t1 GROUP BY a; a | ?column? ---+---------- (0 rows) regards, tom lane
On Fri, Sep 26, 2025 at 6:16 AM David Christensen <david@pgguru.net> wrote: > > > > > On Sep 26, 2025, at 3:45 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > > > > > >> On 26 Sep 2025, at 12:02, Peter Eisentraut <peter@eisentraut.org> wrote: > >> > >> maybe someone wants to take this up again. > > > > I've compared my old patch with David's, and his version looks better. > > If David is not going to work on this in nearest future - I'l like to pick up the work. > > Either way I'll join as reviewer. > > > > Thanks! > > I’m interested in picking it up again but would appreciate the review. Here is a rebased version with a few more tests. I also changed the main check here to using `!contain_agg_clause` instead of `!IsA(Aggref))` directly. (This was defined in `optimizer/clauses.h`, but we already are pulling in `optimizer.h`, so it felt valid to me.) David
Вложения
Added to commitfest as https://commitfest.postgresql.org/patch/6085/
On Fri, Sep 26, 2025 at 9:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Peter Eisentraut <peter@eisentraut.org> writes: > > The initially proposed patch appears to have the right idea overall. > > But it does not handle more complex cases like > > SELECT a, SUM(b)+a FROM t1 GROUP BY ALL; > > > (For explanation: GROUP BY ALL expands to all select list entries that > > do not contain aggregates. So the above would expand to > > SELECT a, SUM(b)+a FROM t1 GROUP BY a; > > which should then be rejected based on the existing rules.) > > I thought I understood this definition, up till your last > comment. What's invalid about that expanded query? > > regression=# create table t1 (a int, b int); > CREATE TABLE > regression=# SELECT a, SUM(b)+a FROM t1 GROUP BY a; > a | ?column? > ---+---------- > (0 rows) Agreed that this shouldn't be an error; added a similar test case to v2 of this patch. David
On 26.09.25 16:11, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: >> The initially proposed patch appears to have the right idea overall. >> But it does not handle more complex cases like >> SELECT a, SUM(b)+a FROM t1 GROUP BY ALL; > >> (For explanation: GROUP BY ALL expands to all select list entries that >> do not contain aggregates. So the above would expand to >> SELECT a, SUM(b)+a FROM t1 GROUP BY a; >> which should then be rejected based on the existing rules.) > > I thought I understood this definition, up till your last > comment. What's invalid about that expanded query? > > regression=# create table t1 (a int, b int); > CREATE TABLE > regression=# SELECT a, SUM(b)+a FROM t1 GROUP BY a; > a | ?column? > ---+---------- > (0 rows) This was a sloppy example. Here is a better one: create table t1 (a int, b int, c int); select a, sum(b)+c from t1 group by all; This is equivalent to select a, sum(b)+c from t1 group by a; which would be rejected as ERROR: column "t1.c" must appear in the GROUP BY clause or be used in an aggregate function
On Fri, Sep 26, 2025 at 10:54 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 26.09.25 16:11, Tom Lane wrote: > > Peter Eisentraut <peter@eisentraut.org> writes: > >> The initially proposed patch appears to have the right idea overall. > >> But it does not handle more complex cases like > >> SELECT a, SUM(b)+a FROM t1 GROUP BY ALL; > > > >> (For explanation: GROUP BY ALL expands to all select list entries that > >> do not contain aggregates. So the above would expand to > >> SELECT a, SUM(b)+a FROM t1 GROUP BY a; > >> which should then be rejected based on the existing rules.) > > > > I thought I understood this definition, up till your last > > comment. What's invalid about that expanded query? > > > > regression=# create table t1 (a int, b int); > > CREATE TABLE > > regression=# SELECT a, SUM(b)+a FROM t1 GROUP BY a; > > a | ?column? > > ---+---------- > > (0 rows) > > This was a sloppy example. Here is a better one: > > create table t1 (a int, b int, c int); > > select a, sum(b)+c from t1 group by all; > > This is equivalent to > > select a, sum(b)+c from t1 group by a; > > which would be rejected as > > ERROR: column "t1.c" must appear in the GROUP BY clause or be used > in an aggregate function Verified that with v2 that this is what happens in this case; will include this and whatever other feedback there is in a v3.
David Christensen <david@pgguru.net> writes: > Here is a rebased version with a few more tests. I also changed the > main check here to using `!contain_agg_clause` instead of > `!IsA(Aggref))` directly. (This was defined in `optimizer/clauses.h`, > but we already are pulling in `optimizer.h`, so it felt valid to me.) contain_agg_clause will blow up on a SubLink, so I doubt this is gonna be robust. regards, tom lane
On Fri, Sep 26, 2025 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Christensen <david@pgguru.net> writes: > > Here is a rebased version with a few more tests. I also changed the > > main check here to using `!contain_agg_clause` instead of > > `!IsA(Aggref))` directly. (This was defined in `optimizer/clauses.h`, > > but we already are pulling in `optimizer.h`, so it felt valid to me.) > > contain_agg_clause will blow up on a SubLink, so I doubt this is > gonna be robust. Fair enough, see that Assert now; easy enough to make a new expression_tree_walker that only looks for Aggref and short-circuits SubLink (which I assume is the right behavior here, but might have to add some more tests/play around with subqueries in the GROUP BY ALL part). David
On Fri, Sep 26, 2025 at 11:13 AM David Christensen <david@pgguru.net> wrote: > > On Fri, Sep 26, 2025 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > David Christensen <david@pgguru.net> writes: > > > Here is a rebased version with a few more tests. I also changed the > > > main check here to using `!contain_agg_clause` instead of > > > `!IsA(Aggref))` directly. (This was defined in `optimizer/clauses.h`, > > > but we already are pulling in `optimizer.h`, so it felt valid to me.) > > > > contain_agg_clause will blow up on a SubLink, so I doubt this is > > gonna be robust. > > Fair enough, see that Assert now; easy enough to make a new > expression_tree_walker that only looks for Aggref and short-circuits > SubLink (which I assume is the right behavior here, but might have to > add some more tests/play around with subqueries in the GROUP BY ALL > part). Or contain_aggs_of_level(), assuming I can suss out how to get the current level... :D Anyway, will mess with this for a bit. David
David Christensen <david@pgguru.net> writes: > On Fri, Sep 26, 2025 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> contain_agg_clause will blow up on a SubLink, so I doubt this is >> gonna be robust. > Fair enough, see that Assert now; easy enough to make a new > expression_tree_walker that only looks for Aggref and short-circuits > SubLink (which I assume is the right behavior here, but might have to > add some more tests/play around with subqueries in the GROUP BY ALL > part). No, I think the correct behavior would have to be to descend into SubLinks to see if they contain any aggregates belonging to the outer query level. However (looks around) we do already have that code. See contain_aggs_of_level. (contain_agg_clause is essentially a simplified version that is okay to use in the planner because it's already gotten rid of sublinks.) What mainly concerns me at this point is whether we've identified aggregate levels at the point in parsing where you want to run this. I have a bit of a worry that that might interact with grouping. Presumably the SQL committee thought about that, so it's probably soluble, but ... regards, tom lane
David Christensen <david@pgguru.net> writes: > Or contain_aggs_of_level(), assuming I can suss out how to get the > current level... :D Anyway, will mess with this for a bit. Current level is zero by definition ... regards, tom lane
On Fri, Sep 26, 2025 at 11:46 PM David Christensen <david@pgguru.net> wrote: > > > > > I’m interested in picking it up again but would appreciate the review. > > Here is a rebased version with a few more tests. I also changed the > main check here to using `!contain_agg_clause` instead of > `!IsA(Aggref))` directly. (This was defined in `optimizer/clauses.h`, > but we already are pulling in `optimizer.h`, so it felt valid to me.) > hi. I only briefly browse the patch text file, so forgive me. seems missing deparse regress tests i think you may need one test like: create view v1 as SELECT b, COUNT(*) FROM t1 GROUP BY ALL; \sv v1
On Fri, Sep 26, 2025 at 11:38 AM jian he <jian.universality@gmail.com> wrote: > > On Fri, Sep 26, 2025 at 11:46 PM David Christensen <david@pgguru.net> wrote: > > > > > > > > I’m interested in picking it up again but would appreciate the review. > > > > Here is a rebased version with a few more tests. I also changed the > > main check here to using `!contain_agg_clause` instead of > > `!IsA(Aggref))` directly. (This was defined in `optimizer/clauses.h`, > > but we already are pulling in `optimizer.h`, so it felt valid to me.) > > > > hi. > I only briefly browse the patch text file, so forgive me. > seems missing deparse regress tests > > i think you may need one test like: > > create view v1 as SELECT b, COUNT(*) FROM t1 GROUP BY ALL; > \sv v1 Thanks, good suggestion; not sure the appropriate final location for this, but it did show a bug in the current patch. David
Version 3 of this patch, incorporating some of the feedback thus far: On Fri, Sep 26, 2025 at 11:56 AM David Christensen <david@pgguru.net> wrote: > > On Fri, Sep 26, 2025 at 11:38 AM jian he <jian.universality@gmail.com> wrote: > > > > On Fri, Sep 26, 2025 at 11:46 PM David Christensen <david@pgguru.net> wrote: > > > > > > > > > > > I’m interested in picking it up again but would appreciate the review. > > > > > > Here is a rebased version with a few more tests. I also changed the > > > main check here to using `!contain_agg_clause` instead of > > > `!IsA(Aggref))` directly. (This was defined in `optimizer/clauses.h`, > > > but we already are pulling in `optimizer.h`, so it felt valid to me.) > > > > > > > hi. > > I only briefly browse the patch text file, so forgive me. > > seems missing deparse regress tests > > > > i think you may need one test like: > > > > create view v1 as SELECT b, COUNT(*) FROM t1 GROUP BY ALL; > > \sv v1 > > Thanks, good suggestion; not sure the appropriate final location for > this, but it did show a bug in the current patch. > > David
Вложения
David Christensen <david@pgguru.net> writes: > Version 3 of this patch, incorporating some of the feedback thus far: Some random comments: I don't love where you put the parsing code. Instead of exposing addTargetToGroupList, I think you should have given transformGroupClause the responsibility of expanding GROUP BY ALL. One reason for that is that GROUP BY processing depends on the results of transformSortClause. This means that in v3, the behavior of SELECT x FROM ... GROUP BY x ORDER BY x; will be subtly different from SELECT x FROM ... GROUP BY ALL ORDER BY x; which seems like a bug, or at least not desirable. (You might need a DESC or USING decoration in the ORDER BY to expose this clearly.) The parsing code itself is not great: + TargetEntry *n = (TargetEntry*)lfirst(l1); + if (!contain_aggs_of_level((Node *)n->expr, 0)) + qry->groupClause = addTargetToGroupList(pstate, n, qry->groupClause, qry->targetList, 0); You should be skipping resjunk entries, and please use "tle" or some such name less generic than "n", and "0" is not the correct location to pass to addTargetToGroupList. Probably the location of the ALL keyword would be the ideal thing, but if we don't have that, the notation for "no location known" is -1 not 0. We could also consider using exprLocation(tle->expr), despite the comment on addTargetToGroupList that that's not the right thing. This might actually be better than pointing at the ALL anyway, since that would give no hint which targetentry caused the error. The test cases seem poorly designed, because it's very hard to be sure whether the code expanded the ALL as-expected. I think you could improve them by not executing the queries but just EXPLAINing them, so that the expanded group-by list is directly visible. regards, tom lane
On Fri, Sep 26, 2025 at 12:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Some random comments: [snip] Great feedback, thanks; attached is v4 which should address these comments. David
Вложения
Peter Eisentraut <peter@eisentraut.org> writes: > On 26.09.25 16:11, Tom Lane wrote: >> I thought I understood this definition, up till your last >> comment. What's invalid about that expanded query? > This was a sloppy example. Here is a better one: > create table t1 (a int, b int, c int); > select a, sum(b)+c from t1 group by all; > This is equivalent to > select a, sum(b)+c from t1 group by a; > which would be rejected as > ERROR: column "t1.c" must appear in the GROUP BY clause or be used > in an aggregate function Got it, mostly. There is an edge case, though: what if there are no candidate grouping items? I see these test cases in David's patch: +-- oops all aggregates +EXPLAIN (COSTS OFF) SELECT COUNT(a), SUM(b) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + Aggregate + -> Seq Scan on t1 +(2 rows) + +-- empty column list +EXPLAIN (COSTS OFF) SELECT FROM t1 GROUP BY ALL; + QUERY PLAN +---------------- + Seq Scan on t1 +(1 row) That is, in such cases the patch behaves as if there were no GROUP BY clause at all, which seems kinda dubious. Should this be an error, and if not what's it supposed to do? The second case is outside the SQL spec so I'm not expecting guidance on that, but surely the committee thought about the first case. Also, what about window functions in the tlist? If you do regression=# explain select sum(q1) over(partition by q2) from int8_tbl group by 1; you get ERROR: window functions are not allowed in GROUP BY LINE 1: explain select sum(q1) over(partition by q2) from int8_tbl g... ^ but that's not what is happening with regression=# explain select sum(q1) over(partition by q2) from int8_tbl group by all; ERROR: column "int8_tbl.q2" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: explain select sum(q1) over(partition by q2) from int8_tbl g... ^ (I didn't stop to figure out why this isn't giving the same error, but maybe it's an order-of-checks thing.) In any case: should this give "window functions are not allowed in GROUP BY", or should the window-function-containing tlist item be silently skipped by GROUP BY ALL? Trying to make it work is surely not the right answer. regards, tom lane
[ trimming the cc: list because gmail decided my last was spam ] David Christensen <david@pgguru.net> writes: > Great feedback, thanks; attached is v4 which should address these comments. I did a pass of cleanup over this --- mostly cosmetic, but not entirely. Along the way I discovered a pre-existing bug: transformSelectStmt does qry->groupDistinct = stmt->groupDistinct; but transformPLAssignStmt fails to, with the result that GROUP BY DISTINCT would misbehave if used in a plpgsql "expression" context. I'm not hugely surprised that no one has reported that from the field, but nonetheless it's broken. In the attached v5 I just quickly added the missing line and moved on, but we'll need to back-patch that bit. (Maybe we'd be well advised to refactor to reduce the amount of duplicated code that needs to be kept in sync?) I have not attempted to address the definitional issues I just queried Peter about. Other open items: * Documentation * The test cases deserve reconsideration now that we think their charter only goes as far as EXPLAIN'ing the results; some of them seem pretty redundant in this context. regards, tom lane diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index b9763ea1714..fe632d3dabf 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -1440,12 +1440,14 @@ transformSelectStmt(ParseState *pstate, SelectStmt *stmt) qry->groupClause = transformGroupClause(pstate, stmt->groupClause, + stmt->groupAll, &qry->groupingSets, &qry->targetList, qry->sortClause, EXPR_KIND_GROUP_BY, false /* allow SQL92 rules */ ); qry->groupDistinct = stmt->groupDistinct; + qry->groupAll = stmt->groupAll; if (stmt->distinctClause == NIL) { @@ -2930,11 +2932,14 @@ transformPLAssignStmt(ParseState *pstate, PLAssignStmt *stmt) qry->groupClause = transformGroupClause(pstate, sstmt->groupClause, + sstmt->groupAll, &qry->groupingSets, &qry->targetList, qry->sortClause, EXPR_KIND_GROUP_BY, false /* allow SQL92 rules */ ); + qry->groupDistinct = sstmt->groupDistinct; + qry->groupAll = sstmt->groupAll; if (sstmt->distinctClause == NIL) { diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 9fd48acb1f8..8a0aa1899e2 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -120,6 +120,7 @@ typedef struct SelectLimit typedef struct GroupClause { bool distinct; + bool all; List *list; } GroupClause; @@ -12993,6 +12994,7 @@ simple_select: n->whereClause = $6; n->groupClause = ($7)->list; n->groupDistinct = ($7)->distinct; + n->groupAll = ($7)->all; n->havingClause = $8; n->windowClause = $9; $$ = (Node *) n; @@ -13010,6 +13012,7 @@ simple_select: n->whereClause = $6; n->groupClause = ($7)->list; n->groupDistinct = ($7)->distinct; + n->groupAll = ($7)->all; n->havingClause = $8; n->windowClause = $9; $$ = (Node *) n; @@ -13507,14 +13510,24 @@ group_clause: GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause)); n->distinct = $3 == SET_QUANTIFIER_DISTINCT; + n->all = false; n->list = $4; $$ = n; } + | GROUP_P BY ALL + { + GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause)); + n->distinct = false; + n->all = true; + n->list = NIL; + $$ = n; + } | /*EMPTY*/ { GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause)); n->distinct = false; + n->all = false; n->list = NIL; $$ = n; } @@ -17618,6 +17631,7 @@ PLpgSQL_Expr: opt_distinct_clause opt_target_list n->whereClause = $4; n->groupClause = ($5)->list; n->groupDistinct = ($5)->distinct; + n->groupAll = ($5)->all; n->havingClause = $6; n->windowClause = $7; n->sortClause = $8; diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index 9f20a70ce13..562e3e90e34 100644 --- a/src/backend/parser/parse_clause.c +++ b/src/backend/parser/parse_clause.c @@ -2598,6 +2598,9 @@ transformGroupingSet(List **flatresult, * GROUP BY items will be added to the targetlist (as resjunk columns) * if not already present, so the targetlist must be passed by reference. * + * If GROUP BY ALL is specified, the groupClause will be inferred to be all + * non-aggregate expressions in the targetlist. + * * This is also used for window PARTITION BY clauses (which act almost the * same, but are always interpreted per SQL99 rules). * @@ -2622,6 +2625,7 @@ transformGroupingSet(List **flatresult, * * pstate ParseState * grouplist clause to transform + * groupByAll is this a GROUP BY ALL statement? * groupingSets reference to list to contain the grouping set tree * targetlist reference to TargetEntry list * sortClause ORDER BY clause (SortGroupClause nodes) @@ -2629,7 +2633,8 @@ transformGroupingSet(List **flatresult, * useSQL99 SQL99 rather than SQL92 syntax */ List * -transformGroupClause(ParseState *pstate, List *grouplist, List **groupingSets, +transformGroupClause(ParseState *pstate, List *grouplist, bool groupByAll, + List **groupingSets, List **targetlist, List *sortClause, ParseExprKind exprKind, bool useSQL99) { @@ -2640,6 +2645,41 @@ transformGroupClause(ParseState *pstate, List *grouplist, List **groupingSets, bool hasGroupingSets = false; Bitmapset *seen_local = NULL; + /* Handle GROUP BY ALL */ + if (groupByAll) + { + /* There cannot have been any explicit list items */ + Assert(grouplist == NIL); + + /* + * Iterate over targets, any non-aggregate gets added as a Target. + * Note that it's not enough to check for a top-level Aggref; we need + * to ensure that any sub-expression here does not include an Aggref + * (for instance an expression such as `sum(col) + 4` should not be + * added as a grouping target). + * + * We specify the parse location as the TLE's location, despite the + * comment for addTargetToGroupList discouraging that. The only other + * thing we could point to is the ALL keyword, which seems unhelpful + * when there are multiple TLEs. + */ + foreach_ptr(TargetEntry, tle, *targetlist) + { + /* Ignore junk TLEs */ + if (tle->resjunk) + continue; + + /* Use contain_aggs_of_level to detect Aggrefs in SubLinks */ + if (!(pstate->p_hasAggs && + contain_aggs_of_level((Node *) tle->expr, 0))) + result = addTargetToGroupList(pstate, tle, + result, *targetlist, + exprLocation((Node *) tle->expr)); + } + + return result; + } + /* * Recursively flatten implicit RowExprs. (Technically this is only needed * for GROUP BY, per the syntax rules for grouping sets, but we do it @@ -2818,6 +2858,7 @@ transformWindowDefinitions(ParseState *pstate, true /* force SQL99 rules */ ); partitionClause = transformGroupClause(pstate, windef->partitionClause, + false /* not GROUP BY ALL */ , NULL, targetlist, orderClause, diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index defcdaa8b34..d0c6d3b55bd 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -6186,7 +6186,9 @@ get_basic_select_query(Query *query, deparse_context *context) save_ingroupby = context->inGroupBy; context->inGroupBy = true; - if (query->groupingSets == NIL) + if (query->groupAll) + appendStringInfoString(buf, "ALL"); + else if (query->groupingSets == NIL) { sep = ""; foreach(l, query->groupClause) diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 4ed14fc5b78..d379879ae80 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -214,7 +214,8 @@ typedef struct Query List *returningList; /* return-values list (of TargetEntry) */ List *groupClause; /* a list of SortGroupClause's */ - bool groupDistinct; /* is the group by clause distinct? */ + bool groupDistinct; /* was GROUP BY DISTINCT used? */ + bool groupAll; /* was GROUP BY ALL used? */ List *groupingSets; /* a list of GroupingSet's if present */ @@ -2192,6 +2193,7 @@ typedef struct SelectStmt Node *whereClause; /* WHERE qualification */ List *groupClause; /* GROUP BY clauses */ bool groupDistinct; /* Is this GROUP BY DISTINCT? */ + bool groupAll; /* Is this GROUP BY ALL? */ Node *havingClause; /* HAVING conditional-expression */ List *windowClause; /* WINDOW window_name AS (...), ... */ diff --git a/src/include/parser/parse_clause.h b/src/include/parser/parse_clause.h index 3e9894926de..ede3903d1dd 100644 --- a/src/include/parser/parse_clause.h +++ b/src/include/parser/parse_clause.h @@ -26,6 +26,7 @@ extern Node *transformLimitClause(ParseState *pstate, Node *clause, ParseExprKind exprKind, const char *constructName, LimitOption limitOption); extern List *transformGroupClause(ParseState *pstate, List *grouplist, + bool groupByAll, List **groupingSets, List **targetlist, List *sortClause, ParseExprKind exprKind, bool useSQL99); diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index 1f24f6ffd1f..aeeb452ec26 100644 --- a/src/test/regress/expected/aggregates.out +++ b/src/test/regress/expected/aggregates.out @@ -1557,6 +1557,153 @@ drop table t2; drop table t3; drop table p_t1; -- +-- Test GROUP BY ALL +-- +-- We don't care about the data here, just the proper transformation of the +-- GROUP BY clause, so test some queries and verify the EXPLAIN plans. +CREATE TEMP TABLE t1 ( + a int, + b int, + c int +); +-- basic field check +EXPLAIN (COSTS OFF) SELECT b, COUNT(*) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: b + -> Seq Scan on t1 +(3 rows) + +-- throw a null in the values too +EXPLAIN (COSTS OFF) SELECT a, COUNT(a) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a + -> Seq Scan on t1 +(3 rows) + +-- multiple columns, non-consecutive order +EXPLAIN (COSTS OFF) SELECT a, SUM(b), b FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a, b + -> Seq Scan on t1 +(3 rows) + +-- multi columns, no aggregate +EXPLAIN (COSTS OFF) SELECT a + b FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: (a + b) + -> Seq Scan on t1 +(3 rows) + +-- non-top-level aggregate +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + 4 FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a + -> Seq Scan on t1 +(3 rows) + +-- including grouped column +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + a FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a + -> Seq Scan on t1 +(3 rows) + +-- oops all aggregates +EXPLAIN (COSTS OFF) SELECT COUNT(a), SUM(b) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + Aggregate + -> Seq Scan on t1 +(2 rows) + +-- empty column list +EXPLAIN (COSTS OFF) SELECT FROM t1 GROUP BY ALL; + QUERY PLAN +---------------- + Seq Scan on t1 +(1 row) + +-- filter +EXPLAIN (COSTS OFF) SELECT a, COUNT(a) FILTER(WHERE b = 2) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a + -> Seq Scan on t1 +(3 rows) + +-- all cols +EXPLAIN (COSTS OFF) SELECT * FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a, b, c + -> Seq Scan on t1 +(3 rows) + +EXPLAIN (COSTS OFF) SELECT *, count(*) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a, b, c + -> Seq Scan on t1 +(3 rows) + +-- expression without including aggregate columns +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + c FROM t1 GROUP BY ALL; +ERROR: column "t1.c" must appear in the GROUP BY clause or be used in an aggregate function +LINE 1: EXPLAIN (COSTS OFF) SELECT a, SUM(b) + c FROM t1 GROUP BY AL... + ^ +-- subquery +EXPLAIN (COSTS OFF) SELECT (SELECT a FROM t1 LIMIT 1), SUM(b) FROM t1 GROUP BY ALL; + QUERY PLAN +----------------------------------- + GroupAggregate + InitPlan 1 + -> Limit + -> Seq Scan on t1 t1_1 + -> Seq Scan on t1 +(5 rows) + +-- cte +EXPLAIN (COSTS OFF) WITH a_query AS (SELECT a, SUM(b) AS sum_b FROM t1 GROUP BY ALL) +SELECT AVG(a), sum_b FROM a_query GROUP BY ALL; + QUERY PLAN +---------------------------- + HashAggregate + Group Key: sum(t1.b) + -> HashAggregate + Group Key: t1.a + -> Seq Scan on t1 +(5 rows) + +-- verify deparse +CREATE VIEW v1 AS SELECT b, COUNT(*) FROM t1 GROUP BY ALL; +NOTICE: view "v1" will be a temporary view +SELECT pg_get_viewdef('v1'::regclass); + pg_get_viewdef +----------------------- + SELECT b, + + count(*) AS count+ + FROM t1 + + GROUP BY ALL; +(1 row) + +DROP VIEW v1; +DROP TABLE t1; +-- -- Test GROUP BY matching of join columns that are type-coerced due to USING -- create temp table t1(f1 int, f2 int); diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql index 62540b1ffa4..6b2516cd9b1 100644 --- a/src/test/regress/sql/aggregates.sql +++ b/src/test/regress/sql/aggregates.sql @@ -549,6 +549,68 @@ drop table t2; drop table t3; drop table p_t1; + +-- +-- Test GROUP BY ALL +-- + +-- We don't care about the data here, just the proper transformation of the +-- GROUP BY clause, so test some queries and verify the EXPLAIN plans. + +CREATE TEMP TABLE t1 ( + a int, + b int, + c int +); + +-- basic field check +EXPLAIN (COSTS OFF) SELECT b, COUNT(*) FROM t1 GROUP BY ALL; + +-- throw a null in the values too +EXPLAIN (COSTS OFF) SELECT a, COUNT(a) FROM t1 GROUP BY ALL; + +-- multiple columns, non-consecutive order +EXPLAIN (COSTS OFF) SELECT a, SUM(b), b FROM t1 GROUP BY ALL; + +-- multi columns, no aggregate +EXPLAIN (COSTS OFF) SELECT a + b FROM t1 GROUP BY ALL; + +-- non-top-level aggregate +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + 4 FROM t1 GROUP BY ALL; + +-- including grouped column +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + a FROM t1 GROUP BY ALL; + +-- oops all aggregates +EXPLAIN (COSTS OFF) SELECT COUNT(a), SUM(b) FROM t1 GROUP BY ALL; + +-- empty column list +EXPLAIN (COSTS OFF) SELECT FROM t1 GROUP BY ALL; + +-- filter +EXPLAIN (COSTS OFF) SELECT a, COUNT(a) FILTER(WHERE b = 2) FROM t1 GROUP BY ALL; + +-- all cols +EXPLAIN (COSTS OFF) SELECT * FROM t1 GROUP BY ALL; +EXPLAIN (COSTS OFF) SELECT *, count(*) FROM t1 GROUP BY ALL; + +-- expression without including aggregate columns +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + c FROM t1 GROUP BY ALL; + +-- subquery +EXPLAIN (COSTS OFF) SELECT (SELECT a FROM t1 LIMIT 1), SUM(b) FROM t1 GROUP BY ALL; + +-- cte +EXPLAIN (COSTS OFF) WITH a_query AS (SELECT a, SUM(b) AS sum_b FROM t1 GROUP BY ALL) +SELECT AVG(a), sum_b FROM a_query GROUP BY ALL; + +-- verify deparse +CREATE VIEW v1 AS SELECT b, COUNT(*) FROM t1 GROUP BY ALL; +SELECT pg_get_viewdef('v1'::regclass); + +DROP VIEW v1; +DROP TABLE t1; + -- -- Test GROUP BY matching of join columns that are type-coerced due to USING --
On 26.09.25 22:18, Tom Lane wrote: > Got it, mostly. There is an edge case, though: what if there are no > candidate grouping items? I see these test cases in David's patch: > > +-- oops all aggregates > +EXPLAIN (COSTS OFF) SELECT COUNT(a), SUM(b) FROM t1 GROUP BY ALL; > + QUERY PLAN > +---------------------- > + Aggregate > + -> Seq Scan on t1 > +(2 rows) > + > +-- empty column list > +EXPLAIN (COSTS OFF) SELECT FROM t1 GROUP BY ALL; > + QUERY PLAN > +---------------- > + Seq Scan on t1 > +(1 row) > > That is, in such cases the patch behaves as if there were no GROUP BY > clause at all, which seems kinda dubious. Should this be an error, > and if not what's it supposed to do? These should resolve to GROUP BY (). > Also, what about window functions in the tlist? > (I didn't stop to figure out why this isn't giving the same error, but > maybe it's an order-of-checks thing.) In any case: should this give > "window functions are not allowed in GROUP BY", or should the > window-function-containing tlist item be silently skipped by GROUP BY > ALL? Trying to make it work is surely not the right answer. Hmm, I don't know. The syntactic transformation talks about select list elements that "do not directly contain an <aggregate function>", but that can also appear as part of <window function>, so the syntactic transformation might appear to apply only to some types of window functions, which doesn't make sense to me. I don't know what a sensible behavior should be here. Maybe in this first patch version just reject use of GROUP BY ALL if you find any window functions in the select list.
On 26.09.25 18:23, Tom Lane wrote: > No, I think the correct behavior would have to be to descend into > SubLinks to see if they contain any aggregates belonging to the > outer query level. > > However (looks around) we do already have that code. > See contain_aggs_of_level. (contain_agg_clause is essentially > a simplified version that is okay to use in the planner because > it's already gotten rid of sublinks.) > > What mainly concerns me at this point is whether we've identified > aggregate levels at the point in parsing where you want to run this. > I have a bit of a worry that that might interact with grouping. > Presumably the SQL committee thought about that, so it's probably > soluble, but ... The language used in the standard at the moment is the select list elements that "do not directly contain an <aggregate function>", where "directly contain" is a term of art that means "contains without an intervening instance of <subquery>, <within group specification>, or <set function specification> that is not an <ordered set function>". So it means not to look into subqueries. Note that in standard SQL, the GROUP BY clause can only contain plain column references, not expressions, so this question is kind of moot in that context, because the query would be invalid no matter whether you transform the GROUP BY ALL to group by the subquery or not. For this first patch version, I suggest you reject the use of GROUP BY ALL if you find a subquery in the select list, unless you have an unambiguous better solution. (It was discussed to relax this restriction, so this discussion is useful to collect some questions related to that.)
Peter Eisentraut <peter@eisentraut.org> writes: > The language used in the standard at the moment is the select list > elements that "do not directly contain an <aggregate function>", where > "directly contain" is a term of art that means "contains without an > intervening instance of <subquery>, <within group specification>, or > <set function specification> that is not an <ordered set function>". So > it means not to look into subqueries. TBH, that is obvious nonsense. A subquery could contain an aggregate function that we've already identified as being of the current query level. Putting such a construct into the GROUP BY list would create an invalid query (cf. checkTargetlistEntrySQL92). Similarly, putting a window function into the GROUP BY list would create an invalid query. > Note that in standard SQL, the GROUP BY clause can only contain plain > column references, not expressions, so this question is kind of moot in > that context, because the query would be invalid no matter whether you > transform the GROUP BY ALL to group by the subquery or not. So according to the standard, this: select a+b, count(*) from ... group by all; would be invalid because a+b couldn't be written directly in GROUP BY? I can't see us rejecting that though, since we do allow a+b in GROUP BY. Seems like we're getting very little help from the standard as to what this construct actually means. I suggest that we ignore the current draft as not having been thought through quite enough yet, and make ALL skip any tlist entries that contain_aggs_of_level zero or contain_windowfuncs. If that means we're extending the standard, so be it --- we've already extended GROUP BY quite a lot, it seems. regards, tom lane
On 27/09/2025 18:03, Tom Lane wrote: > So according to the standard, this: > > select a+b, count(*) from ... group by all; > > would be invalid because a+b couldn't be written directly in > GROUP BY? Correct. > I can't see us rejecting that though, since we do > allow a+b in GROUP BY. No, nor do I. Also, there were rumors about adding expressions to group by in the standard but no formal proposal yet. FWIW, I opposed adding this GROUP BY ALL feature, but I was outnumbered. I hope to never see it in production. -- Vik Fearing
Here's a v6 that's rebased up to HEAD and contains fixes for the semantic issues we discussed. It still lacks documentation, but otherwise I think it's about ready to go. regards, tom lane From 5503815351b2e45661262fddaab1271a435dd730 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sat, 27 Sep 2025 18:19:16 -0400 Subject: [PATCH v6] Add GROUP BY ALL. GROUP BY ALL is a form of GROUP BY that adds any TargetExpr that does not contain an aggregate or window function into the groupClause of the query, making it exactly equivalent to specifying those same expressions in an explicit GROUP BY list. This feature is useful for certain kinds of data exploration. It's already present in some other DBMSes, and the SQL committee recently accepted it into the standard, so we can be reasonably confident in the syntax being stable. We do have to invent part of the semantics, as the standard doesn't allow for expressions in GROUP BY, so they haven't specified what to do with window functions. We assume that those should be treated like aggregates, i.e., left out of the constructed GROUP BY list. Author: David Christensen <david@pgguru.net> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAHM0NXjz0kDwtzoe-fnHAqPB1qA8_VJN0XAmCgUZ+iPnvP5LbA@mail.gmail.com --- src/backend/parser/analyze.c | 2 + src/backend/parser/gram.y | 14 +++ src/backend/parser/parse_clause.c | 65 ++++++++++++- src/backend/utils/adt/ruleutils.c | 4 +- src/include/nodes/parsenodes.h | 4 +- src/include/parser/parse_clause.h | 1 + src/test/regress/expected/aggregates.out | 113 +++++++++++++++++++++++ src/test/regress/sql/aggregates.sql | 50 ++++++++++ 8 files changed, 250 insertions(+), 3 deletions(-) diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 5aeb54eb5f6..99ba043961b 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -1467,12 +1467,14 @@ transformSelectStmt(ParseState *pstate, SelectStmt *stmt, qry->groupClause = transformGroupClause(pstate, stmt->groupClause, + stmt->groupAll, &qry->groupingSets, &qry->targetList, qry->sortClause, EXPR_KIND_GROUP_BY, false /* allow SQL92 rules */ ); qry->groupDistinct = stmt->groupDistinct; + qry->groupAll = stmt->groupAll; if (stmt->distinctClause == NIL) { diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 9fd48acb1f8..8a0aa1899e2 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -120,6 +120,7 @@ typedef struct SelectLimit typedef struct GroupClause { bool distinct; + bool all; List *list; } GroupClause; @@ -12993,6 +12994,7 @@ simple_select: n->whereClause = $6; n->groupClause = ($7)->list; n->groupDistinct = ($7)->distinct; + n->groupAll = ($7)->all; n->havingClause = $8; n->windowClause = $9; $$ = (Node *) n; @@ -13010,6 +13012,7 @@ simple_select: n->whereClause = $6; n->groupClause = ($7)->list; n->groupDistinct = ($7)->distinct; + n->groupAll = ($7)->all; n->havingClause = $8; n->windowClause = $9; $$ = (Node *) n; @@ -13507,14 +13510,24 @@ group_clause: GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause)); n->distinct = $3 == SET_QUANTIFIER_DISTINCT; + n->all = false; n->list = $4; $$ = n; } + | GROUP_P BY ALL + { + GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause)); + n->distinct = false; + n->all = true; + n->list = NIL; + $$ = n; + } | /*EMPTY*/ { GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause)); n->distinct = false; + n->all = false; n->list = NIL; $$ = n; } @@ -17618,6 +17631,7 @@ PLpgSQL_Expr: opt_distinct_clause opt_target_list n->whereClause = $4; n->groupClause = ($5)->list; n->groupDistinct = ($5)->distinct; + n->groupAll = ($5)->all; n->havingClause = $6; n->windowClause = $7; n->sortClause = $8; diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index 9f20a70ce13..ca26f6f61f2 100644 --- a/src/backend/parser/parse_clause.c +++ b/src/backend/parser/parse_clause.c @@ -2598,6 +2598,9 @@ transformGroupingSet(List **flatresult, * GROUP BY items will be added to the targetlist (as resjunk columns) * if not already present, so the targetlist must be passed by reference. * + * If GROUP BY ALL is specified, the groupClause will be inferred to be all + * non-aggregate, non-window expressions in the targetlist. + * * This is also used for window PARTITION BY clauses (which act almost the * same, but are always interpreted per SQL99 rules). * @@ -2622,6 +2625,7 @@ transformGroupingSet(List **flatresult, * * pstate ParseState * grouplist clause to transform + * groupByAll is this a GROUP BY ALL statement? * groupingSets reference to list to contain the grouping set tree * targetlist reference to TargetEntry list * sortClause ORDER BY clause (SortGroupClause nodes) @@ -2629,7 +2633,8 @@ transformGroupingSet(List **flatresult, * useSQL99 SQL99 rather than SQL92 syntax */ List * -transformGroupClause(ParseState *pstate, List *grouplist, List **groupingSets, +transformGroupClause(ParseState *pstate, List *grouplist, bool groupByAll, + List **groupingSets, List **targetlist, List *sortClause, ParseExprKind exprKind, bool useSQL99) { @@ -2640,6 +2645,63 @@ transformGroupClause(ParseState *pstate, List *grouplist, List **groupingSets, bool hasGroupingSets = false; Bitmapset *seen_local = NULL; + /* Handle GROUP BY ALL */ + if (groupByAll) + { + /* There cannot have been any explicit grouplist items */ + Assert(grouplist == NIL); + + /* Iterate over targets, adding acceptable ones to the result list */ + foreach_ptr(TargetEntry, tle, *targetlist) + { + /* Ignore junk TLEs */ + if (tle->resjunk) + continue; + + /* + * TLEs containing aggregates are not okay to add to GROUP BY + * (compare checkTargetlistEntrySQL92). But the SQL standard + * directs us to skip them, so it's fine. + */ + if (pstate->p_hasAggs && + contain_aggs_of_level((Node *) tle->expr, 0)) + continue; + + /* + * Likewise, TLEs containing window functions are not okay to add + * to GROUP BY. At this writing, the SQL standard is silent on + * what to do with them, but by analogy to aggregates we'll just + * skip them. + */ + if (pstate->p_hasWindowFuncs && + contain_windowfuncs((Node *) tle->expr)) + continue; + + /* + * Otherwise, add the TLE to the result using default sort/group + * semantics. We specify the parse location as the TLE's + * location, despite the comment for addTargetToGroupList + * discouraging that. The only other thing we could point to is + * the ALL keyword, which seems unhelpful when there are multiple + * TLEs. + */ + result = addTargetToGroupList(pstate, tle, + result, *targetlist, + exprLocation((Node *) tle->expr)); + } + + /* If we found any acceptable targets, we're done */ + if (result != NIL) + return result; + + /* + * Otherwise, the SQL standard says to treat it like "GROUP BY ()". + * Build a representation of that, and let the rest of this function + * handle it. + */ + grouplist = list_make1(makeGroupingSet(GROUPING_SET_EMPTY, NIL, -1)); + } + /* * Recursively flatten implicit RowExprs. (Technically this is only needed * for GROUP BY, per the syntax rules for grouping sets, but we do it @@ -2818,6 +2880,7 @@ transformWindowDefinitions(ParseState *pstate, true /* force SQL99 rules */ ); partitionClause = transformGroupClause(pstate, windef->partitionClause, + false /* not GROUP BY ALL */ , NULL, targetlist, orderClause, diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index defcdaa8b34..d0c6d3b55bd 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -6186,7 +6186,9 @@ get_basic_select_query(Query *query, deparse_context *context) save_ingroupby = context->inGroupBy; context->inGroupBy = true; - if (query->groupingSets == NIL) + if (query->groupAll) + appendStringInfoString(buf, "ALL"); + else if (query->groupingSets == NIL) { sep = ""; foreach(l, query->groupClause) diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index f1706df58fd..aa73f7d76dd 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -214,7 +214,8 @@ typedef struct Query List *returningList; /* return-values list (of TargetEntry) */ List *groupClause; /* a list of SortGroupClause's */ - bool groupDistinct; /* is the group by clause distinct? */ + bool groupDistinct; /* was GROUP BY DISTINCT used? */ + bool groupAll; /* was GROUP BY ALL used? */ List *groupingSets; /* a list of GroupingSet's if present */ @@ -2192,6 +2193,7 @@ typedef struct SelectStmt Node *whereClause; /* WHERE qualification */ List *groupClause; /* GROUP BY clauses */ bool groupDistinct; /* Is this GROUP BY DISTINCT? */ + bool groupAll; /* Is this GROUP BY ALL? */ Node *havingClause; /* HAVING conditional-expression */ List *windowClause; /* WINDOW window_name AS (...), ... */ diff --git a/src/include/parser/parse_clause.h b/src/include/parser/parse_clause.h index 3e9894926de..ede3903d1dd 100644 --- a/src/include/parser/parse_clause.h +++ b/src/include/parser/parse_clause.h @@ -26,6 +26,7 @@ extern Node *transformLimitClause(ParseState *pstate, Node *clause, ParseExprKind exprKind, const char *constructName, LimitOption limitOption); extern List *transformGroupClause(ParseState *pstate, List *grouplist, + bool groupByAll, List **groupingSets, List **targetlist, List *sortClause, ParseExprKind exprKind, bool useSQL99); diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index 1f24f6ffd1f..2cdb7562846 100644 --- a/src/test/regress/expected/aggregates.out +++ b/src/test/regress/expected/aggregates.out @@ -1557,6 +1557,119 @@ drop table t2; drop table t3; drop table p_t1; -- +-- Test GROUP BY ALL +-- +-- We don't care about the data here, just the proper transformation of the +-- GROUP BY clause, so test some queries and verify the EXPLAIN plans. +-- +CREATE TEMP TABLE t1 ( + a int, + b int, + c int +); +-- basic example +EXPLAIN (COSTS OFF) SELECT b, COUNT(*) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: b + -> Seq Scan on t1 +(3 rows) + +-- multiple columns, non-consecutive order +EXPLAIN (COSTS OFF) SELECT a, SUM(b), b FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a, b + -> Seq Scan on t1 +(3 rows) + +-- multi columns, no aggregate +EXPLAIN (COSTS OFF) SELECT a + b FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: (a + b) + -> Seq Scan on t1 +(3 rows) + +-- check we detect a non-top-level aggregate +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + 4 FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a + -> Seq Scan on t1 +(3 rows) + +-- including grouped column is okay +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + a FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a + -> Seq Scan on t1 +(3 rows) + +-- including non-grouped column, not so much +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + c FROM t1 GROUP BY ALL; +ERROR: column "t1.c" must appear in the GROUP BY clause or be used in an aggregate function +LINE 1: EXPLAIN (COSTS OFF) SELECT a, SUM(b) + c FROM t1 GROUP BY AL... + ^ +-- all aggregates, should reduce to GROUP BY () +EXPLAIN (COSTS OFF) SELECT COUNT(a), SUM(b) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + Aggregate + Group Key: () + -> Seq Scan on t1 +(3 rows) + +-- likewise with empty target list +EXPLAIN (COSTS OFF) SELECT FROM t1 GROUP BY ALL; + QUERY PLAN +----------------------- + Result + Replaces: Aggregate +(2 rows) + +-- window functions are not to be included in GROUP BY, either +EXPLAIN (COSTS OFF) SELECT a, COUNT(a) OVER (PARTITION BY a) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------------------- + WindowAgg + Window: w1 AS (PARTITION BY a) + -> Sort + Sort Key: a + -> HashAggregate + Group Key: a + -> Seq Scan on t1 +(7 rows) + +-- all cols +EXPLAIN (COSTS OFF) SELECT *, count(*) FROM t1 GROUP BY ALL; + QUERY PLAN +---------------------- + HashAggregate + Group Key: a, b, c + -> Seq Scan on t1 +(3 rows) + +-- verify deparsing of GROUP BY ALL +CREATE TEMP VIEW v1 AS SELECT b, COUNT(*) FROM t1 GROUP BY ALL; +SELECT pg_get_viewdef('v1'::regclass); + pg_get_viewdef +----------------------- + SELECT b, + + count(*) AS count+ + FROM t1 + + GROUP BY ALL; +(1 row) + +DROP VIEW v1; +DROP TABLE t1; +-- -- Test GROUP BY matching of join columns that are type-coerced due to USING -- create temp table t1(f1 int, f2 int); diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql index 62540b1ffa4..ec41da493ad 100644 --- a/src/test/regress/sql/aggregates.sql +++ b/src/test/regress/sql/aggregates.sql @@ -549,6 +549,56 @@ drop table t2; drop table t3; drop table p_t1; +-- +-- Test GROUP BY ALL +-- +-- We don't care about the data here, just the proper transformation of the +-- GROUP BY clause, so test some queries and verify the EXPLAIN plans. +-- + +CREATE TEMP TABLE t1 ( + a int, + b int, + c int +); + +-- basic example +EXPLAIN (COSTS OFF) SELECT b, COUNT(*) FROM t1 GROUP BY ALL; + +-- multiple columns, non-consecutive order +EXPLAIN (COSTS OFF) SELECT a, SUM(b), b FROM t1 GROUP BY ALL; + +-- multi columns, no aggregate +EXPLAIN (COSTS OFF) SELECT a + b FROM t1 GROUP BY ALL; + +-- check we detect a non-top-level aggregate +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + 4 FROM t1 GROUP BY ALL; + +-- including grouped column is okay +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + a FROM t1 GROUP BY ALL; + +-- including non-grouped column, not so much +EXPLAIN (COSTS OFF) SELECT a, SUM(b) + c FROM t1 GROUP BY ALL; + +-- all aggregates, should reduce to GROUP BY () +EXPLAIN (COSTS OFF) SELECT COUNT(a), SUM(b) FROM t1 GROUP BY ALL; + +-- likewise with empty target list +EXPLAIN (COSTS OFF) SELECT FROM t1 GROUP BY ALL; + +-- window functions are not to be included in GROUP BY, either +EXPLAIN (COSTS OFF) SELECT a, COUNT(a) OVER (PARTITION BY a) FROM t1 GROUP BY ALL; + +-- all cols +EXPLAIN (COSTS OFF) SELECT *, count(*) FROM t1 GROUP BY ALL; + +-- verify deparsing of GROUP BY ALL +CREATE TEMP VIEW v1 AS SELECT b, COUNT(*) FROM t1 GROUP BY ALL; +SELECT pg_get_viewdef('v1'::regclass); + +DROP VIEW v1; +DROP TABLE t1; + -- -- Test GROUP BY matching of join columns that are type-coerced due to USING -- -- 2.43.7
On Sep 28, 2025, at 06:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:+ /* Iterate over targets, adding acceptable ones to the result list */
+ foreach_ptr(TargetEntry, tle, *targetlist)
+ {
+ /* Ignore junk TLEs */
+ if (tle->resjunk)
+ continue;
Do we want to specifically check “ctid”?
If a user does:
```
select ctid, col1, col2, ... from t group by all;
```It would be equivalent to no group by. Combing “select ctid” with “group by all” seems totally useless, but when users do such things, we can optimize that.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
Chao Li <li.evan.chao@gmail.com> writes: > Do we want to specifically check “ctid”? No. Complete waste of effort. regards, tom lane
On Sat, Sep 27, 2025 at 5:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Here's a v6 that's rebased up to HEAD and contains fixes for the > semantic issues we discussed. It still lacks documentation, but > otherwise I think it's about ready to go. Here is v7 with a stab at docs; fairly minimal at this point, but touching the two areas that are likely to need adjusting. When adjusting the docs for sql-select, I noticed that the grammar also supports `GROUP BY ALL <grouping_elements>`, so I also added a test to ensure that this syntax is explicitly supported. (It seems like it works as-is without further grammar adjustments, but I was a little worried when I first saw that fact... :D) Not sure that aggregates.sql is still the right place for all of these bits, but it does seem like having all things `GROUP BY ALL`-related tested in the same place is a nice property, so leaving there for now. David
On Sun, Sep 28, 2025 at 2:18 PM David Christensen <david@pgguru.net> wrote: > > On Sat, Sep 27, 2025 at 5:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Here's a v6 that's rebased up to HEAD and contains fixes for the > > semantic issues we discussed. It still lacks documentation, but > > otherwise I think it's about ready to go. > > Here is v7 with a stab at docs; fairly minimal at this point, but > touching the two areas that are likely to need adjusting. When > adjusting the docs for sql-select, I noticed that the grammar also > supports `GROUP BY ALL <grouping_elements>`, so I also added a test to > ensure that this syntax is explicitly supported. (It seems like it > works as-is without further grammar adjustments, but I was a little > worried when I first saw that fact... :D) Not sure that > aggregates.sql is still the right place for all of these bits, but it > does seem like having all things `GROUP BY ALL`-related tested in the > same place is a nice property, so leaving there for now. This time with attachment!
Вложения
David Christensen <david@pgguru.net> writes: >> Here is v7 with a stab at docs; fairly minimal at this point, but >> touching the two areas that are likely to need adjusting. I did some more word-smithing on the docs and pushed it. >> When >> adjusting the docs for sql-select, I noticed that the grammar also >> supports `GROUP BY ALL <grouping_elements>`, so I also added a test to >> ensure that this syntax is explicitly supported. +1, can't hurt. >> (It seems like it >> works as-is without further grammar adjustments, but I was a little >> worried when I first saw that fact... :D) Bison would have been vocal about it if you'd introduced any ambiguity. Still, I didn't feel like looking around to see if we already covered this syntax, and I agree it's close enough to being an issue to be worth covering. Thanks for the patch! regards, tom lane
> On Sep 29, 2025, at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Christensen <david@pgguru.net> writes: >>> Here is v7 with a stab at docs; fairly minimal at this point, but >>> touching the two areas that are likely to need adjusting. > > I did some more word-smithing on the docs and pushed it. > >>> When >>> adjusting the docs for sql-select, I noticed that the grammar also >>> supports `GROUP BY ALL <grouping_elements>`, so I also added a test to >>> ensure that this syntax is explicitly supported. > > +1, can't hurt. > >>> (It seems like it >>> works as-is without further grammar adjustments, but I was a little >>> worried when I first saw that fact... :D) > > Bison would have been vocal about it if you'd introduced any > ambiguity. Still, I didn't feel like looking around to see if we > already covered this syntax, and I agree it's close enough to being > an issue to be worth covering. > > Thanks for the patch! Great, thank you! David
On Sep 29, 2025, at 04:34, David Christensen <david@pgguru.net> wrote:On Sun, Sep 28, 2025 at 2:18 PM David Christensen <david@pgguru.net> wrote:
On Sat, Sep 27, 2025 at 5:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Here's a v6 that's rebased up to HEAD and contains fixes for the
semantic issues we discussed. It still lacks documentation, but
otherwise I think it's about ready to go.
Here is v7 with a stab at docs; fairly minimal at this point, but
touching the two areas that are likely to need adjusting. When
adjusting the docs for sql-select, I noticed that the grammar also
supports `GROUP BY ALL <grouping_elements>`, so I also added a test to
ensure that this syntax is explicitly supported. (It seems like it
works as-is without further grammar adjustments, but I was a little
worried when I first saw that fact... :D) Not sure that
aggregates.sql is still the right place for all of these bits, but it
does seem like having all things `GROUP BY ALL`-related tested in the
same place is a nice property, so leaving there for now.
This time with attachment!
<v7-0001-Add-GROUP-BY-ALL.patch>
A nit comment for the doc:
```
+ not contain either an aggregate function or a window function in their
+ expression list. This can greatly simplify ad-hoc exploration of data.
+ </para>
```
“In their expression list” => “in the <literal>SELECT</literal> list"
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/