Re: Group by more efficient than distinct?
От | PFC |
---|---|
Тема | Re: Group by more efficient than distinct? |
Дата | |
Msg-id | op.t9sycqo6cigqcu@apollo13.peufeu.com обсуждение исходный текст |
Ответ на | Re: Group by more efficient than distinct? (Gregory Stark <stark@enterprisedb.com>) |
Список | pgsql-performance |
On Fri, 18 Apr 2008 11:36:02 +0200, Gregory Stark <stark@enterprisedb.com> wrote: > "Francisco Reyes" <lists@stringsutils.com> writes: > >> Is there any dissadvantage of using "group by" to obtain a unique list? >> >> On a small dataset the difference was about 20% percent. >> >> Group by >> HashAggregate (cost=369.61..381.12 rows=1151 width=8) (actual >> time=76.641..85.167 rows=2890 loops=1) Basically : - If you process up to some percentage of your RAM worth of data, hashing is going to be a lot faster - If the size of the hash grows larger than your RAM, hashing will fail miserably and sorting will be much faster since PG's disksort is really good - GROUP BY knows this and acts accordingly - DISTINCT doesn't know this, it only knows sorting, so it sorts - If you need DISTINCT x ORDER BY x, sorting may be faster too (depending on the % of distinct rows) - If you need DISTINCT ON, well, you're stuck with the Sort - So, for the time being, you can replace DISTINCT with GROUP BY...
В списке pgsql-performance по дате отправления: