Re: Nested query performance issue
От | Heikki Linnakangas |
---|---|
Тема | Re: Nested query performance issue |
Дата | |
Msg-id | 49DDDC07.7080608@enterprisedb.com обсуждение исходный текст |
Ответ на | Nested query performance issue (Glenn Maynard <glennfmaynard@gmail.com>) |
Ответы |
Re: Nested query performance issue
Re: Nested query performance issue |
Список | pgsql-performance |
Glenn Maynard wrote: > This rewrite allows getting the top N scores. Unfortunately, this one > takes 950ms for the same data. With 1000000 scores, it takes 14800ms. > > SELECT s.* FROM score s, game g > WHERE s.game_id = g.id AND > s.id IN ( > SELECT s2.id FROM score s2 WHERE s2.game_id=g.id ORDER BY s2.score > DESC LIMIT 1 > ); You don't really need the join with game here, simplifying this into: SELECT s.* FROM score s WHERE s.id IN ( SELECT s2.id FROM score s2 WHERE s2.game_id=s.game_id ORDER BY s2.score DESC LIMIT 1 ); I don't think it makes it any faster, though. You can also do this in a very nice and clean fashion using the upcoming PG 8.4 window functions: SELECT * FROM ( SELECT s.*, rank() OVER (PARTITION BY s.game_id ORDER BY score DESC) AS rank FROM score s ) AS sub WHERE rank <= 5; but I'm not sure how much faster it is. At least here on my laptop it does a full index scan on score, which may or may not be faster than just picking the top N values for each game using the index. > This seems simple: for each game, search for the highest score, and > then scan the tree to get the next N-1 highest scores. The first > version does just that, but the second one is doing a seq scan over > score. You can do that approach with a SQL function: CREATE FUNCTION topnscores(game_id int , n int) RETURNS SETOF score LANGUAGE SQL AS $$ SELECT * FROM score s WHERE s.game_id = $1 ORDER BY score DESC LIMIT $2 $$; SELECT (sub.ts).id, (sub.ts).score, (sub.ts).game_id FROM (SELECT topnscores(g.id, 5) ts FROM game g) sub; -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-performance по дате отправления: