Обсуждение: select from grouped data
Hello, I have written this function to process some multiline text data into a table with both text & numeric columns: CREATE OR REPLACE FUNCTION get_final_energy () RETURNS TABLE ( jid text, "timestamp" text, scf integer, energy double precision ) AS $$ WITH a AS ( SELECT jid, regexp_split_to_table(totenergy_out, '\n') AS teo FROM results ), b AS ( SELECT results.jid, results. "timestamp", cast( CASE WHEN split_part(a.teo, ' ', 2) = '' THEN '0' ELSE split_part(a.teo, ' ', 2) END AS integer) AS scf, cast( CASE WHEN split_part(a.teo, ' ', 3) = '' THEN '0.0' ELSE split_part(a.teo, ' ', 3) END AS double precision) AS energy FROM results, a WHERE results.jid = a.jid GROUP BY results.jid, results. "timestamp", a.teo --HAVING -- scf = max(scf) ORDER BY timestamp ASC, scf DESC ) SELECT * FROM b; $$ LANGUAGE sql; The output looks like: jid | timestamp | scf | energy ------------+-----------------+-----+---------------- 1250_1 | 20210805-114634 | 18 | -1316.43700819 1250_1 | 20210805-114634 | 17 | -1316.43700825 1250_1 | 20210805-114634 | 16 | -1316.4370097 1250_1 | 20210805-114634 | 15 | -1316.43700991 1250_1 | 20210805-114634 | 14 | -1316.43699775 1250_1 | 20210805-114634 | 13 | -1316.43699117 1250_1 | 20210805-114634 | 12 | -1316.43750771 1250_1 | 20210805-114634 | 11 | -1316.43805358 1250_1 | 20210805-114634 | 10 | -1316.43857192 1250_1 | 20210805-114634 | 9 | -1316.43070942 1251_1 | 20210806-062539 | 18 | -1316.43700819 1251_1 | 20210806-062539 | 17 | -1316.43700826 .... What I want is to get (for each group) the energy corresponding to the maximum value of scf. I appreciate any help, Thanks, Roger
The output looks like:
jid | timestamp | scf | energy
------------+-----------------+-----+----------------
1250_1 | 20210805-114634 | 18 | -1316.43700819
1250_1 | 20210805-114634 | 17 | -1316.43700825
1250_1 | 20210805-114634 | 16 | -1316.4370097
1250_1 | 20210805-114634 | 15 | -1316.43700991
1250_1 | 20210805-114634 | 14 | -1316.43699775
1250_1 | 20210805-114634 | 13 | -1316.43699117
1250_1 | 20210805-114634 | 12 | -1316.43750771
1250_1 | 20210805-114634 | 11 | -1316.43805358
1250_1 | 20210805-114634 | 10 | -1316.43857192
1250_1 | 20210805-114634 | 9 | -1316.43070942
1251_1 | 20210806-062539 | 18 | -1316.43700819
1251_1 | 20210806-062539 | 17 | -1316.43700826
....
What I want is to get (for each group) the energy corresponding to the
maximum value of scf.
SELECT DISTINCT ON (jid) jid, timestamp, scf, energy [...] ORDER BY jid, scf DESC
David J.
David G. Johnston writes: > SELECT DISTINCT ON (jid) jid, timestamp, scf, energy [...] ORDER BY jid, > scf DESC Got it. Many thanks, Roger
Greetings, * Roger Mason (rmason@mun.ca) wrote: > I have written this function to process some multiline text data into a > table with both text & numeric columns: > > CREATE OR REPLACE FUNCTION get_final_energy () > RETURNS TABLE ( > jid text, > "timestamp" text, > scf integer, > energy double precision > ) > AS $$ > WITH a AS ( > SELECT > jid, > regexp_split_to_table(totenergy_out, '\n') AS teo > FROM > results > ), > b AS ( > SELECT > results.jid, > results. "timestamp", > cast( > CASE WHEN split_part(a.teo, ' ', 2) = '' THEN > '0' > ELSE > split_part(a.teo, ' ', 2) > END AS integer) AS scf, > cast( > CASE WHEN split_part(a.teo, ' ', 3) = '' THEN > '0.0' > ELSE > split_part(a.teo, ' ', 3) > END AS double precision) AS energy > FROM > results, > a > WHERE > results.jid = a.jid > GROUP BY > results.jid, > results. "timestamp", > a.teo > --HAVING > -- scf = max(scf) > ORDER BY > timestamp ASC, > scf DESC > ) > SELECT > * > FROM > b; > > $$ > LANGUAGE sql; > > The output looks like: > > jid | timestamp | scf | energy > ------------+-----------------+-----+---------------- > 1250_1 | 20210805-114634 | 18 | -1316.43700819 > 1250_1 | 20210805-114634 | 17 | -1316.43700825 > 1250_1 | 20210805-114634 | 16 | -1316.4370097 > 1250_1 | 20210805-114634 | 15 | -1316.43700991 > 1250_1 | 20210805-114634 | 14 | -1316.43699775 > 1250_1 | 20210805-114634 | 13 | -1316.43699117 > 1250_1 | 20210805-114634 | 12 | -1316.43750771 > 1250_1 | 20210805-114634 | 11 | -1316.43805358 > 1250_1 | 20210805-114634 | 10 | -1316.43857192 > 1250_1 | 20210805-114634 | 9 | -1316.43070942 > 1251_1 | 20210806-062539 | 18 | -1316.43700819 > 1251_1 | 20210806-062539 | 17 | -1316.43700826 > .... > > What I want is to get (for each group) the energy corresponding to the > maximum value of scf. * David G. Johnston (david.g.johnston@gmail.com) wrote: > > > > > > The output looks like: > > > > jid | timestamp | scf | energy > > ------------+-----------------+-----+---------------- > > 1250_1 | 20210805-114634 | 18 | -1316.43700819 > > 1250_1 | 20210805-114634 | 17 | -1316.43700825 > > 1250_1 | 20210805-114634 | 16 | -1316.4370097 > > 1250_1 | 20210805-114634 | 15 | -1316.43700991 > > 1250_1 | 20210805-114634 | 14 | -1316.43699775 > > 1250_1 | 20210805-114634 | 13 | -1316.43699117 > > 1250_1 | 20210805-114634 | 12 | -1316.43750771 > > 1250_1 | 20210805-114634 | 11 | -1316.43805358 > > 1250_1 | 20210805-114634 | 10 | -1316.43857192 > > 1250_1 | 20210805-114634 | 9 | -1316.43070942 > > 1251_1 | 20210806-062539 | 18 | -1316.43700819 > > 1251_1 | 20210806-062539 | 17 | -1316.43700826 > > .... > > > > What I want is to get (for each group) the energy corresponding to the > > maximum value of scf. > > > > > SELECT DISTINCT ON (jid) jid, timestamp, scf, energy [...] ORDER BY jid, > scf DESC While this works, it's generally better to use a LATERAL join as that's part of the SQL standard while DISTINCT ON isn't. Using a LATERAL join also would allow you to have multiple rows (top-N) if you wanted. You'd do that using: WITH jids AS (SELECT jid FROM results GROUP BY jid) SELECT jids.jid, t.ts, t.scf, t.energy FROM jids CROSS JOIN LATERAL (SELECT ts, scf, energy FROM results WHERE results.jid = jids.jid ORDER BY scf DESC LIMIT 1) AS t ; A couple of notes on this: don't name a column "timestamp" and when it's a timestamp, use the 'timestamptz' data type, not text. Your jids sure look like they should just be numbers instead of text too. If you already have a distinct set of jids somewhere (like in another table), you could use that as the source table instead of the CTE that I'm using above. Thanks, Stephen
Вложения
I don't know that "generally better" is the correct description. There are a lot of things not covered by the SQL standard and it would be crazy not to make use of them, e.g. insert on conflict and distinct on, especially when they make the query more readable and more correct. Worrying about sql portability is almost always time not well spent and an invitation to do things not as well as they can be.
Jim
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Jim
On August 25, 2021 4:42:56 PM EDT, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Roger Mason (rmason@mun.ca) wrote:I have written this function to process some multiline text data into a
table with both text & numeric columns:
CREATE OR REPLACE FUNCTION get_final_energy ()
RETURNS TABLE (
jid text,
"timestamp" text,
scf integer,
energy double precision
)
AS $$
WITH a AS (
SELECT
jid,
regexp_split_to_table(totenergy_out, '\n') AS teo
FROM
results
),
b AS (
SELECT
results.jid,
results. "timestamp",
cast(
CASE WHEN split_part(a.teo, ' ', 2) = '' THEN
'0'
ELSE
split_part(a.teo, ' ', 2)
END AS integer) AS scf,
cast(
CASE WHEN split_part(a.teo, ' ', 3) = '' THEN
'0.0'
ELSE
split_part(a.teo, ' ', 3)
END AS double precision) AS energy
FROM
results,
a
WHERE
results.jid = a.jid
GROUP BY
results.jid,
results. "timestamp",
a.teo
--HAVING
-- scf = max(scf)
ORDER BY
timestamp ASC,
scf DESC
)
SELECT
*
FROM
b;
$$
LANGUAGE sql;
The output looks like:
jid | timestamp | scf | energy
------------+-----------------+-----+----------------
1250_1 | 20210805-114634 | 18 | -1316.43700819
1250_1 | 20210805-114634 | 17 | -1316.43700825
1250_1 | 20210805-114634 | 16 | -1316.4370097
1250_1 | 20210805-114634 | 15 | -1316.43700991
1250_1 | 20210805-114634 | 14 | -1316.43699775
1250_1 | 20210805-114634 | 13 | -1316.43699117
1250_1 | 20210805-114634 | 12 | -1316.43750771
1250_1 | 20210805-114634 | 11 | -1316.43805358
1250_1 | 20210805-114634 | 10 | -1316.43857192
1250_1 | 20210805-114634 | 9 | -1316.43070942
1251_1 | 20210806-062539 | 18 | -1316.43700819
1251_1 | 20210806-062539 | 17 | -1316.43700826
....
What I want is to get (for each group) the energy corresponding to the
maximum value of scf.
* David G. Johnston (david.g.johnston@gmail.com) wrote:SELECT DISTINCT ON (jid) jid, timestamp, scf, energy [...] ORDER BY jid,
The output looks like:
jid | timestamp | scf | energy
------------+-----------------+-----+----------------
1250_1 | 20210805-114634 | 18 | -1316.43700819
1250_1 | 20210805-114634 | 17 | -1316.43700825
1250_1 | 20210805-114634 | 16 | -1316.4370097
1250_1 | 20210805-114634 | 15 | -1316.43700991
1250_1 | 20210805-114634 | 14 | -1316.43699775
1250_1 | 20210805-114634 | 13 | -1316.43699117
1250_1 | 20210805-114634 | 12 | -1316.43750771
1250_1 | 20210805-114634 | 11 | -1316.43805358
1250_1 | 20210805-114634 | 10 | -1316.43857192
1250_1 | 20210805-114634 | 9 | -1316.43070942
1251_1 | 20210806-062539 | 18 | -1316.43700819
1251_1 | 20210806-062539 | 17 | -1316.43700826
....
What I want is to get (for each group) the energy corresponding to the
maximum value of scf.
scf DESC
While this works, it's generally better to use a LATERAL join as that's
part of the SQL standard while DISTINCT ON isn't. Using a LATERAL join
also would allow you to have multiple rows (top-N) if you wanted. You'd
do that using:
WITH jids AS (SELECT jid FROM results GROUP BY jid)
SELECT jids.jid, t.ts, t.scf, t.energy
FROM jids CROSS JOIN LATERAL
(SELECT ts, scf, energy FROM results
WHERE results.jid = jids.jid
ORDER BY scf DESC
LIMIT 1) AS t
;
A couple of notes on this: don't name a column "timestamp" and when it's
a timestamp, use the 'timestamptz' data type, not text. Your jids sure
look like they should just be numbers instead of text too. If you
already have a distinct set of jids somewhere (like in another table),
you could use that as the source table instead of the CTE that I'm using
above.
Thanks,
Stephen
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Greetings,
We generally prefer to reply in-line on these threads, as I’ve done below. Thanks.
On Wed, Aug 25, 2021 at 17:37 James Keener <jim@jimkeener.com> wrote:
I don't know that "generally better" is the correct description. There are a lot of things not covered by the SQL standard and it would be crazy not to make use of them, e.g. insert on conflict and distinct on, especially when they make the query more readable and more correct. Worrying about sql portability is almost always time not well spent and an invitation to do things not as well as they can be.
It’s generally better as it follows the SQL standard and is more flexible as it allows you to get top-N, as I said. DISTINCT ON has always been a hack and its use should be discouraged at this point.
When an UPSERT capability exists in the actual standard then it’s use should be promoted over other hacks that various vendors have implemented also.
Thanks again,
Stephen
Stephen Frost writes: > > It’s generally better as it follows the SQL standard and is more flexible > as it allows you to get top-N, as I said. DISTINCT ON has always been a > hack and its use should be discouraged at this point. > > When an UPSERT capability exists in the actual standard then it’s use > should be promoted over other hacks that various vendors have implemented > also. OK, thanks, I've filed this discussion away for future reference. Meanwhile I'm having trouble getting this function to work as a trigger. I'll start another thread on that. Thanks again. Roger