POC: GROUP BY optimization

Поиск

Список

Период

Сортировка

От	Teodor Sigaev
Тема	POC: GROUP BY optimization
Дата	30 мая 2018 г. 02:07:50
Msg-id	7c79e6a5-8597-74e8-0671-1c39d124c9d6@sigaev.ru обсуждение исходный текст
Ответы	Re: POC: GROUP BY optimization
Список	pgsql-hackers

Дерево обсуждения

Hi!

As far as I known, columns in GROUP BY could be reordered without loss of 
correctness. But reorder could give a benefits in performance. Suggested patch 
implements that in several ways:

1) Reorder GROUP BY columns by number of unique values in descent order. Simple 
example shows a performance difference by two times (see 
opt_group_by_demostrator.sql, on my notebook first two queries demonstrate 
that). The idea is: if first column is unique then sort comparator will never 
compute difference of following columns.

2) Reorder GROUP BY columns to match existing pathkeys which are result of index 
scan or ORDER BY clause. It prevents to add sort node - suppose, it's a clear win.

3) Patch makes suggestion to use index by GROUP BY clause, but unlike to ORDER 
BY or merge join case it doesn't pay attention to actual order of columns 
because of 2)

Patch doesn't change any cost estimation computation, although 1) could take an 
advantage of it.

Some issues/problems/notices:

1) I didn't play around GROUPING SETS at all. As I can see, current coding 
prevents any optimization around it and, suppose, it should be addressed  to 
another patch

2) I'm not completely satisfied with counting of number of groups per column, it 
looks ugly without refactoring estimate_num_groups(). At least, now it could be 
called multiple times for each column. May be, this number should be added to 
PathKey structure?

3) EquivalenceClass->ec_sortref. If planner faced with column first time in 
WHERE clause it doesn't fill target reference field because it is unknown yet. 
Patch looks for accordance of PathKey (group_pathkeys) and SortGroupClause 
(groupClause) and fails in this case. So, get_eclass_for_sort_expr() now updates 
ec_sortref field if it's not initialized yet.

4) Algorithms to reorder columns is proportional to N^2 where N is number of 
columns, but I hope it isn't a problem because number of GROUP BY columns isn't 
very big usually.




-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

POC: GROUP BY optimization

Вложения