Re: big distinct clause vs. group by

Поиск
Список
Период
Сортировка
От Uwe Bartels
Тема Re: big distinct clause vs. group by
Дата
Msg-id BANLkTimPgtYKLyEE8K0NKoE=DAvgm7p8+A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: big distinct clause vs. group by  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: big distinct clause vs. group by  (Robert Klemme <shortcutter@googlemail.com>)
Re: big distinct clause vs. group by  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-performance
Hi Robert,

thanks for your answer.
the aggregate function I was talking about is the function I need to use for the non-group by columns like min() in my example.
There are of course several function to choose from, and I wanted to know which causes as less as possible resources.

best regards,
Uwe


On 18 April 2011 18:19, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 16, 2011 at 4:45 AM, Uwe Bartels <uwe.bartels@gmail.com> wrote:
> I'm having trouble with some sql statements which use an expression with
> many columns and distinct in the column list of the select.
> select distinct col1,col2,.....col20,col21
> from table1 left join table2 on <join condition>,...
> where
>  <other expressions>;
>
> The negative result is a big sort with teporary files.
>               ->  Sort  (cost=5813649.93..5853067.63 rows=15767078 width=80)
> (actual time=79027.079..81556.059 rows=12076838 loops=1)
>                     Sort Method:  external sort  Disk: 1086096kB
> By the way - for this query I have a work_mem of 1 GB - so raising this
> further is not generally possible - also not for one special command, due to
> parallelism.
>
> How do I get around this?

Hmm.  It seems to me that there's no way to work out the distinct
values without either sorting or hashing the output, which will
necessarily be slow if you have a lot of data.

> I have one idea and like to know if there any other approaches or an even
> known better solution to that problem. By using group by I don't need the
> big sort for the distinct - I reduce it (theoreticly) to the key columns.
>
> select <list of key columns>,<non key column>
> from tables1left join table2 on <join condition>,...
> where
>  <other conditions>
> group by <list of key columns>

You might try SELECT DISTINCT ON (key columns) <key columns> <non-key
columns> FROM ...

> Another question would be what's the aggregate function which needs as less
> as possible resources (time).

Not sure I follow this part.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Select in subselect vs select = any array
Следующее
От: Stefan Keller
Дата:
Сообщение: How to configure a read-only database server?