Обсуждение: Re: select top N entries from several groups (Modified by David Orme)
[Forgot to send this to the list address rather than Sean's. Apologies.] Hi Sean, Many thanks for that. If I understand the syntax correctly, the subquery is executed for each row and checks to see if the current val is in the top N list for the gp of that row. My problem is that the real life example is on a table of 54,720 rows. A single run of the subquery takes 67ms - so the whole query takes something like an hour [(0.067*54720)/(60*60)]. At least, it certainly takes a long time and that is my interpretation. My suspicion is that there isn't any way round this - I can have elegant, short, slower SQL or write the subquery for each group independently - faster but not nearly so pleasing. Fortunately the real number of groups is fixed and small (8) so this is tractable. Is this right? Thanks, David On 12 Apr 2005, at 13:21, Sean Davis wrote: > > On Apr 12, 2005, at 7:21 AM, David Orme wrote: > >> Hi, >> >> Suppose I have a table (called temp) like this: >> >> gp val >> A 10 >> A 8 >> A 6 >> A 4 >> B 3 >> B 2 >> B 1 >> B 0 >> >> How can I get the largest two values for each group in a single pass? >> I want to end up with: >> >> gp val >> A 10 >> A 8 >> B 3 >> B 2 >> >> I can do this a group at a time using... > > How about: > > create table temp ( > gp char, > val int > ); > > insert into temp values ('A',10); > insert into temp values ('A',8); > insert into temp values ('A',6); > insert into temp values ('A',4); > insert into temp values ('B',3); > insert into temp values ('B',2); > insert into temp values ('B',1); > > select a.gp,a.val > from temp a > where a.val in ( > select b.val > from temp b > where a.gp=b.gp > order by b.val desc > limit 2); > > gp | val > ----+----- > A | 10 > A | 8 > B | 3 > B | 2 > (4 rows) > > I have found this link is useful for beginning to think about > subqueries: > > http://www.postgresql.org/files/documentation/books/aw_pgsql/ > node81.html > > Sean >
Re: select top N entries from several groups (Modified by David Orme)
От
"Greg Sabino Mullane"
Дата:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > How can I get the largest two values for each group in a single pass? ... > Fortunately the real number of groups is fixed and > small (8) so this is tractable. Depends on your definition of "single pass", but if you know exactly what the groups are, you can do something like this: (SELECT DISTINCT gp, val FROM temp WHERE gp='A' ORDER BY val DESC LIMIT 2) UNION ALL (SELECT DISTINCT gp, val FROM temp WHERE gp='B' ORDER BY val DESC LIMIT 2) UNION ALL (SELECT DISTINCT gp, val FROM temp WHERE gp='C' ORDER BY val DESC LIMIT 2) UNION ALL (SELECT DISTINCT gp, val FROM temp WHERE gp='D' ORDER BY val DESC LIMIT 2) UNION ALL (SELECT DISTINCT gp, val FROM temp WHERE gp='E' ORDER BY val DESC LIMIT 2) UNION ALL (SELECT DISTINCT gp, val FROM temp WHERE gp='F' ORDER BY val DESC LIMIT 2) UNION ALL (SELECT DISTINCT gp, val FROM temp WHERE gp='G' ORDER BY val DESC LIMIT 2) UNION ALL (SELECT DISTINCT gp, val FROM temp WHERE gp='H' ORDER BY val DESC LIMIT 2) ORDER BY 1,2 DESC; Creating an index on gp will help out as well: CREATE INDEX temp_gp ON temp(gp); ANALYZE temp; The above ran on my system with 80,000 rows in 1.3 seconds. (Should be even faster if you don't need the DISTINCT; which can be removed if every combination of gp and val is unique). - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200504122153 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCXHxfvJuQZxSWSsgRAnXaAKDK3IGx+7fdZhahk3q3x6Pn+TENXgCgqbbR Y7ybBfp5yfcA1z8ktgdGrU4= =5ng9 -----END PGP SIGNATURE-----