Обсуждение: Re: select top N entries from several groups (Modified by David Orme)

Поиск
Список
Период
Сортировка

Re: select top N entries from several groups (Modified by David Orme)

От
David Orme
Дата:
[Forgot to send this to the list address rather than Sean's. Apologies.]

Hi Sean,

Many thanks for that. If I understand the syntax correctly, the
subquery is executed for each row and checks to see if the current val
is in the top N list for the gp of that row.

My problem is that the real life example is on a table of 54,720 rows.
A single run of the subquery takes 67ms - so the whole query takes
something like an hour [(0.067*54720)/(60*60)]. At least, it certainly
takes a long time and that is my interpretation. My suspicion is that
there isn't any way round this - I can have elegant, short, slower SQL
or write the subquery for each group independently - faster but not
nearly so pleasing. Fortunately the real number of groups is fixed and
small (8) so this is tractable.

Is this right?

Thanks,
David

On 12 Apr 2005, at 13:21, Sean Davis wrote:

>
> On Apr 12, 2005, at 7:21 AM, David Orme wrote:
>
>> Hi,
>>
>> Suppose I have a table (called temp) like this:
>>
>> gp    val
>> A    10
>> A    8
>> A    6
>> A    4
>> B    3
>> B    2
>> B    1
>> B    0
>>
>> How can I get the largest two values for each group in a single pass?
>> I want to end up with:
>>
>> gp    val
>> A    10
>> A    8
>> B    3
>> B    2
>>
>> I can do this a group at a time using...
>
> How about:
>
> create table temp (
>     gp char,
>     val int
> );
>
> insert into temp values ('A',10);
> insert into temp values ('A',8);
> insert into temp values ('A',6);
> insert into temp values ('A',4);
> insert into temp values ('B',3);
> insert into temp values ('B',2);
> insert into temp values ('B',1);
>
> select a.gp,a.val
> from   temp a
> where  a.val in (
>                 select b.val
>                 from   temp b
>                 where  a.gp=b.gp
>                 order by b.val desc
>                 limit 2);
>
>  gp | val
> ----+-----
>  A  |  10
>  A  |   8
>  B  |   3
>  B  |   2
> (4 rows)
>
> I have found this link is useful for beginning to think about
> subqueries:
>
> http://www.postgresql.org/files/documentation/books/aw_pgsql/
> node81.html
>
> Sean
>


Re: select top N entries from several groups (Modified by David Orme)

От
"Greg Sabino Mullane"
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> How can I get the largest two values for each group in a single pass?
...
> Fortunately the real number of groups is fixed and
> small (8) so this is tractable.

Depends on your definition of "single pass", but if you know exactly
what the groups are, you can do something like this:

(SELECT DISTINCT gp, val FROM temp WHERE gp='A' ORDER BY val DESC LIMIT 2)
UNION ALL
(SELECT DISTINCT gp, val FROM temp WHERE gp='B' ORDER BY val DESC LIMIT 2)
UNION ALL
(SELECT DISTINCT gp, val FROM temp WHERE gp='C' ORDER BY val DESC LIMIT 2)
UNION ALL
(SELECT DISTINCT gp, val FROM temp WHERE gp='D' ORDER BY val DESC LIMIT 2)
UNION ALL
(SELECT DISTINCT gp, val FROM temp WHERE gp='E' ORDER BY val DESC LIMIT 2)
UNION ALL
(SELECT DISTINCT gp, val FROM temp WHERE gp='F' ORDER BY val DESC LIMIT 2)
UNION ALL
(SELECT DISTINCT gp, val FROM temp WHERE gp='G' ORDER BY val DESC LIMIT 2)
UNION ALL
(SELECT DISTINCT gp, val FROM temp WHERE gp='H' ORDER BY val DESC LIMIT 2)
ORDER BY 1,2 DESC;

Creating an index on gp will help out as well:

CREATE INDEX temp_gp ON temp(gp);
ANALYZE temp;

The above ran on my system with 80,000 rows in 1.3 seconds. (Should be even
faster if you don't need the DISTINCT; which can be removed if every
combination of gp and val is unique).

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200504122153
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFCXHxfvJuQZxSWSsgRAnXaAKDK3IGx+7fdZhahk3q3x6Pn+TENXgCgqbbR
Y7ybBfp5yfcA1z8ktgdGrU4=
=5ng9
-----END PGP SIGNATURE-----