POC: GROUP BY optimization
От | Teodor Sigaev |
---|---|
Тема | POC: GROUP BY optimization |
Дата | |
Msg-id | 7c79e6a5-8597-74e8-0671-1c39d124c9d6@sigaev.ru обсуждение исходный текст |
Ответы |
Re: POC: GROUP BY optimization
|
Список | pgsql-hackers |
Hi! As far as I known, columns in GROUP BY could be reordered without loss of correctness. But reorder could give a benefits in performance. Suggested patch implements that in several ways: 1) Reorder GROUP BY columns by number of unique values in descent order. Simple example shows a performance difference by two times (see opt_group_by_demostrator.sql, on my notebook first two queries demonstrate that). The idea is: if first column is unique then sort comparator will never compute difference of following columns. 2) Reorder GROUP BY columns to match existing pathkeys which are result of index scan or ORDER BY clause. It prevents to add sort node - suppose, it's a clear win. 3) Patch makes suggestion to use index by GROUP BY clause, but unlike to ORDER BY or merge join case it doesn't pay attention to actual order of columns because of 2) Patch doesn't change any cost estimation computation, although 1) could take an advantage of it. Some issues/problems/notices: 1) I didn't play around GROUPING SETS at all. As I can see, current coding prevents any optimization around it and, suppose, it should be addressed to another patch 2) I'm not completely satisfied with counting of number of groups per column, it looks ugly without refactoring estimate_num_groups(). At least, now it could be called multiple times for each column. May be, this number should be added to PathKey structure? 3) EquivalenceClass->ec_sortref. If planner faced with column first time in WHERE clause it doesn't fill target reference field because it is unknown yet. Patch looks for accordance of PathKey (group_pathkeys) and SortGroupClause (groupClause) and fails in this case. So, get_eclass_for_sort_expr() now updates ec_sortref field if it's not initialized yet. 4) Algorithms to reorder columns is proportional to N^2 where N is number of columns, but I hope it isn't a problem because number of GROUP BY columns isn't very big usually. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Вложения
В списке pgsql-hackers по дате отправления: