Re: Stuck with a query...
От | Edmund Bacon |
---|---|
Тема | Re: Stuck with a query... |
Дата | |
Msg-id | m3vf813od0.fsf@elb_lx.onesystem.ca обсуждение исходный текст |
Ответ на | Postgres cluster (Morten <lists@kikobu.com>) |
Список | pgsql-general |
gsstark@mit.edu (Greg Stark) writes: > Geoff Caplan <geoff@variosoft.com> writes: > > > Hi folks, > > > > Sorry to ask a newbie SQL question but I'm struggling... > > There's no efficient way to write this in standard SQL. However Postgres has > an extension DISTINCT ON that would do it: > > select url,count(*) > from (select distinct on (session_id) > url > from clickstream > order by session_id,sequence_num desc > ) > group by url > > This isn't going to be a superfast query. It has to sort all the clickstream > records by session and sequence, take just the last one, then probably sort > those again. > As an experiment I tried a more 'standard SQL' approach to this problem: SELECT url, count(1) FROM clickstream WHERE (session_id, sequence_num) IN (SELECT session_id, max(sequence_num) FROM clickstream GROUP BY session_id) GROUP BY url; On a table with about 100,000 rows this runs in about 720ms on my system , compared to the ON DISTICNT version which runs in about 1000ms. Adding an index on (session_id, sequence_num) reduced the run time to about 690ms, but made no difference to the DISTINCT ON version. With only about 10,000 rows, there's no appreciable difference. This surprised me, because I expected the DISTINCT ON to be better. -- Remove -42 for email
В списке pgsql-general по дате отправления: