Re: unnecessary sort in the execution plan when doing group by

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: unnecessary sort in the execution plan when doing group by
Дата
Msg-id CAApHDvofgjgP_O97-LL-KoqQeAxxvuk7dzcG_Xc1SUUoBFeVJQ@mail.gmail.com
обсуждение исходный текст
Ответ на unnecessary sort in the execution plan when doing group by  ("Huang, Suya" <Suya.Huang@au.experian.com>)
Ответы Re: unnecessary sort in the execution plan when doing group by  ("Huang, Suya" <Suya.Huang@au.experian.com>)
Список pgsql-performance
On Tue, Oct 28, 2014 at 7:26 PM, Huang, Suya <Suya.Huang@au.experian.com> wrote:

Hi,

 

This is the Greenplum database 4.3.1.0.


Likely this is the wrong place to ask for help. The plan output that you've pasted below looks very different to PostgreSQL's EXPLAIN output.
 

                                                                                  QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 24:1  (slice2; segments: 24)  (cost=31286842.08..31287447.81 rows=1683 width=536)

   Rows out:  15380160 rows at destination with 14860 ms to first row, 23856 ms to end, start offset by 104 ms.

   ->  HashAggregate  (cost=31286842.08..31287447.81 rows=1683 width=536)



-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 24:1  (slice2; segments: 24)  (cost=152269717.33..157009763.41 rows=1196982 width=568)

   Rows out:  15380160 rows at destination with 35320 ms to first row, 70091 ms to end, start offset by 102 ms.

   ->  GroupAggregate  (cost=152269717.33..157009763.41 rows=1196982 width=568)



Most likely the reason you're getting the difference in plan is because the planner is probably decided that there will be too many hash entries for a hash table based on the 3 grouping columns... Look at the estimates, 1683 with 2 columns and 1196982 with the 3 columns. If those estimates turned out to be true, then the hash table for 3 columns will be massively bigger than it would be with 2 columns. With PostgreSQL you might see the plan changing if you increased the work_mem setting. For greenplum, I've no idea if that's the same.

Databases are often not very good at knowing with the number of distinct values would be over more than 1 column. Certain databases have solved this with multi column statistics, but PostgreSQL does not have these. Although I just noticed last night that someone is working on them.

Regards

David Rowley

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Huang, Suya"
Дата:
Сообщение: unnecessary sort in the execution plan when doing group by
Следующее
От: jmcdonagh
Дата:
Сообщение: Incredibly slow restore times after 9.0>9.2 upgrade