Re: POC: GROUP BY optimization

Поиск

Список

Период

Сортировка

От	Teodor Sigaev
Тема	Re: POC: GROUP BY optimization
Дата	7 июня 2018 г. 19:22:13
Msg-id	a02225c6-96d3-2e85-9907-9459105d1244@sigaev.ru обсуждение исходный текст
Ответ на	Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: POC: GROUP BY optimization
Список	pgsql-hackers

Дерево обсуждения

>> Yes. But again, this description is a bit short. First it works after
>> first patch and might get some preordered leading pathkeys. Second, it
>> tries to match ORDER BY clause order if there is no preordered leading
>> pathkeys from first patch (it was introduced in v7). And third, if there
>> is a tail of unmatched pathkeys on previous stages then it will reorder
>> that tail.
>>
> 
> OK, I haven't looked at v7 yet, but if I understand correctly it tries
> to maintain the ordering as much as possible? Does that actually help? I
> mean, the incremental sort patch allows the sorting to happen by pieces,
> but here we still need to sort all the data, right?
> 
> Can you give an example demonstrating the benefit?

See tst.sql. queries are marked with opt (optimization is on) and noopt.

Query 1: select count(*) from btg group by v, r;
Query 2: select count(*) from btg group by n, v, r order by n;

For both queries it's possible to reorder v and r column, n column has the 
single distinct value.

On my laptop:
Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms
       2opt vs 2noopt: 5800.307 ms vs 7486.967 ms

So, what we see:
1) for query 1 optimization gives 2 times better performance, for query 2 only 
30%. if column 'n' will be unique then time for query 1 and 2 should be the 
same. We could add check for preordered pathkeys in 
get_cheapest_group_keys_order() and if estimate_num_groups(reordered pathkeys) 
is close to 1 then we could do not reordering of tail of pathkeys.

2) Planing cost is the same for all queries. So, cost_sort() doesn't take into 
account even number of columns.

> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.
Added, v9, debug_enable_group_by_match_order_by and 
debug_enable_cheapest_group_by. I also checked compatibility with incremental 
sort patch, and all works except small merge conflict which could be resolved 
right before committing.

Next, I had a look on cost_incremental_sort() provided by incremental sort patch 
and, it's a pity, it doesn't solve our problem with the impact of the cost of 
per-column comparison function and number of its calls.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: POC: GROUP BY optimization

Вложения