Обсуждение: select from grouped data

Поиск

Список

Период

Сортировка

select from grouped data

От

Roger Mason

Дата:

25 августа 2021 г., 14:07:15

Hello,

I have written this function to process some multiline text data into a
table with both text & numeric columns:

CREATE OR REPLACE FUNCTION get_final_energy ()
  RETURNS TABLE (
    jid text,
    "timestamp" text,
    scf integer,
    energy double precision
  )
  AS $$
  WITH a AS (
    SELECT
      jid,
      regexp_split_to_table(totenergy_out, '\n') AS teo
    FROM
      results
),
b AS (
  SELECT
    results.jid,
    results. "timestamp",
    cast(
      CASE WHEN split_part(a.teo, ' ', 2) = '' THEN
        '0'
      ELSE
        split_part(a.teo, ' ', 2)
      END AS integer) AS scf,
    cast(
      CASE WHEN split_part(a.teo, ' ', 3) = '' THEN
        '0.0'
      ELSE
        split_part(a.teo, ' ', 3)
      END AS double precision) AS energy
  FROM
    results,
    a
  WHERE
    results.jid = a.jid
  GROUP BY
    results.jid,
    results. "timestamp",
    a.teo
  --HAVING
    -- scf = max(scf)
    ORDER BY
         timestamp ASC,
         scf DESC
)
SELECT
  *
FROM
  b;

$$
LANGUAGE sql;

The output looks like:

   jid     |    timestamp    | scf |     energy     
------------+-----------------+-----+----------------
 1250_1     | 20210805-114634 |  18 | -1316.43700819
 1250_1     | 20210805-114634 |  17 | -1316.43700825
 1250_1     | 20210805-114634 |  16 |  -1316.4370097
 1250_1     | 20210805-114634 |  15 | -1316.43700991
 1250_1     | 20210805-114634 |  14 | -1316.43699775
 1250_1     | 20210805-114634 |  13 | -1316.43699117
 1250_1     | 20210805-114634 |  12 | -1316.43750771
 1250_1     | 20210805-114634 |  11 | -1316.43805358
 1250_1     | 20210805-114634 |  10 | -1316.43857192
 1250_1     | 20210805-114634 |   9 | -1316.43070942
 1251_1     | 20210806-062539 |  18 | -1316.43700819
 1251_1     | 20210806-062539 |  17 | -1316.43700826
....

What I want is to get (for each group) the energy corresponding to the
maximum value of scf.

I appreciate any help,
Thanks,
Roger

Re: select from grouped data

От

"David G. Johnston"

Дата:

25 августа 2021 г., 16:19:39

The output looks like:

jid | timestamp | scf | energy
------------+-----------------+-----+----------------
1250_1 | 20210805-114634 | 18 | -1316.43700819
1250_1 | 20210805-114634 | 17 | -1316.43700825
1250_1 | 20210805-114634 | 16 | -1316.4370097
1250_1 | 20210805-114634 | 15 | -1316.43700991
1250_1 | 20210805-114634 | 14 | -1316.43699775
1250_1 | 20210805-114634 | 13 | -1316.43699117
1250_1 | 20210805-114634 | 12 | -1316.43750771
1250_1 | 20210805-114634 | 11 | -1316.43805358
1250_1 | 20210805-114634 | 10 | -1316.43857192
1250_1 | 20210805-114634 | 9 | -1316.43070942
1251_1 | 20210806-062539 | 18 | -1316.43700819
1251_1 | 20210806-062539 | 17 | -1316.43700826
....

What I want is to get (for each group) the energy corresponding to the
maximum value of scf.

SELECT DISTINCT ON (jid) jid, timestamp, scf, energy [...] ORDER BY jid, scf DESC

David J.

Re: select from grouped data

От

Roger Mason

Дата:

25 августа 2021 г., 16:59:10

David G. Johnston writes:

> SELECT DISTINCT ON (jid) jid, timestamp, scf, energy [...] ORDER BY jid,
> scf DESC

Got it.

Many thanks,
Roger

Re: select from grouped data

От

James Keener

Дата:

26 августа 2021 г., 00:37:42

I don't know that "generally better" is the correct description. There are a lot of things not covered by the SQL standard and it would be crazy not to make use of them, e.g. insert on conflict and distinct on, especially when they make the query more readable and more correct. Worrying about sql portability is almost always time not well spent and an invitation to do things not as well as they can be.

Jim

On August 25, 2021 4:42:56 PM EDT, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Roger Mason (rmason@mun.ca) wrote:
I have written this function to process some multiline text data into a
table with both text & numeric columns:

CREATE OR REPLACE FUNCTION get_final_energy ()
  RETURNS TABLE (
    jid text,
    "timestamp" text,
    scf integer,
    energy double precision
  )
  AS $$
  WITH a AS (
    SELECT
      jid,
      regexp_split_to_table(totenergy_out, '\n') AS teo
    FROM
      results
),
b AS (
  SELECT
    results.jid,
    results. "timestamp",
    cast(
      CASE WHEN split_part(a.teo, ' ', 2) = '' THEN
        '0'
      ELSE
        split_part(a.teo, ' ', 2)
      END AS integer) AS scf,
    cast(
      CASE WHEN split_part(a.teo, ' ', 3) = '' THEN
        '0.0'
      ELSE
        split_part(a.teo, ' ', 3)
      END AS double precision) AS energy
  FROM
    results,
    a
  WHERE
    results.jid = a.jid
  GROUP BY
    results.jid,
    results. "timestamp",
    a.teo
  --HAVING
    -- scf = max(scf)
    ORDER BY
         timestamp ASC,
         scf DESC
)
SELECT
  *
FROM
  b;

$$
LANGUAGE sql;

The output looks like:

   jid     |    timestamp    | scf |     energy     
------------+-----------------+-----+----------------
 1250_1     | 20210805-114634 |  18 | -1316.43700819
 1250_1     | 20210805-114634 |  17 | -1316.43700825
 1250_1     | 20210805-114634 |  16 |  -1316.4370097
 1250_1     | 20210805-114634 |  15 | -1316.43700991
 1250_1     | 20210805-114634 |  14 | -1316.43699775
 1250_1     | 20210805-114634 |  13 | -1316.43699117
 1250_1     | 20210805-114634 |  12 | -1316.43750771
 1250_1     | 20210805-114634 |  11 | -1316.43805358
 1250_1     | 20210805-114634 |  10 | -1316.43857192
 1250_1     | 20210805-114634 |   9 | -1316.43070942
 1251_1     | 20210806-062539 |  18 | -1316.43700819
 1251_1     | 20210806-062539 |  17 | -1316.43700826
....

What I want is to get (for each group) the energy corresponding to the
maximum value of scf.

* David G. Johnston (david.g.johnston@gmail.com) wrote:


 The output looks like:

    jid     |    timestamp    | scf |     energy
 ------------+-----------------+-----+----------------
  1250_1     | 20210805-114634 |  18 | -1316.43700819
  1250_1     | 20210805-114634 |  17 | -1316.43700825
  1250_1     | 20210805-114634 |  16 |  -1316.4370097
  1250_1     | 20210805-114634 |  15 | -1316.43700991
  1250_1     | 20210805-114634 |  14 | -1316.43699775
  1250_1     | 20210805-114634 |  13 | -1316.43699117
  1250_1     | 20210805-114634 |  12 | -1316.43750771
  1250_1     | 20210805-114634 |  11 | -1316.43805358
  1250_1     | 20210805-114634 |  10 | -1316.43857192
  1250_1     | 20210805-114634 |   9 | -1316.43070942
  1251_1     | 20210806-062539 |  18 | -1316.43700819
  1251_1     | 20210806-062539 |  17 | -1316.43700826
 ....

 What I want is to get (for each group) the energy corresponding to the
 maximum value of scf.


SELECT DISTINCT ON (jid) jid, timestamp, scf, energy [...] ORDER BY jid,
scf DESC

While this works, it's generally better to use a LATERAL join as that's
part of the SQL standard while DISTINCT ON isn't.  Using a LATERAL join
also would allow you to have multiple rows (top-N) if you wanted.  You'd
do that using:

WITH jids AS (SELECT jid FROM results GROUP BY jid)
SELECT jids.jid, t.ts, t.scf, t.energy
FROM jids CROSS JOIN LATERAL
 (SELECT ts, scf, energy FROM results
  WHERE results.jid = jids.jid
  ORDER BY scf DESC
  LIMIT 1) AS t
;

A couple of notes on this: don't name a column "timestamp" and when it's
a timestamp, use the 'timestamptz' data type, not text.  Your jids sure
look like they should just be numbers instead of text too.  If you
already have a distinct set of jids somewhere (like in another table),
you could use that as the source table instead of the CTE that I'm using
above.

Thanks,

Stephen

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: select from grouped data

От

Stephen Frost

Дата:

26 августа 2021 г., 01:03:56

Greetings,

We generally prefer to reply in-line on these threads, as I’ve done below. Thanks.

On Wed, Aug 25, 2021 at 17:37 James Keener <jim@jimkeener.com> wrote:

I don't know that "generally better" is the correct description. There are a lot of things not covered by the SQL standard and it would be crazy not to make use of them, e.g. insert on conflict and distinct on, especially when they make the query more readable and more correct. Worrying about sql portability is almost always time not well spent and an invitation to do things not as well as they can be.

It’s generally better as it follows the SQL standard and is more flexible as it allows you to get top-N, as I said. DISTINCT ON has always been a hack and its use should be discouraged at this point.

When an UPSERT capability exists in the actual standard then it’s use should be promoted over other hacks that various vendors have implemented also.

Thanks again,

Stephen

Re: select from grouped data

От

Roger Mason

Дата:

28 августа 2021 г., 16:08:41

Stephen Frost writes:

>
> It’s generally better as it follows the SQL standard and is more flexible
> as it allows you to get top-N, as I said. DISTINCT ON has always been a
> hack and its use should be discouraged at this point.
>
> When an UPSERT capability exists in the actual standard then it’s use
> should be promoted over other hacks that various vendors have implemented
> also.

OK, thanks, I've filed this discussion away for future reference.
Meanwhile I'm having trouble getting this function to work as a
trigger.  I'll start another thread on that.

Thanks again.

Roger

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: select from grouped data

select from grouped data

Re: select from grouped data

Re: select from grouped data

Re: select from grouped data

Вложения

Re: select from grouped data

Re: select from grouped data

Re: select from grouped data