Re: POC: GROUP BY optimization
От | Teodor Sigaev |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | a02225c6-96d3-2e85-9907-9459105d1244@sigaev.ru обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: POC: GROUP BY optimization
|
Список | pgsql-hackers |
>> Yes. But again, this description is a bit short. First it works after >> first patch and might get some preordered leading pathkeys. Second, it >> tries to match ORDER BY clause order if there is no preordered leading >> pathkeys from first patch (it was introduced in v7). And third, if there >> is a tail of unmatched pathkeys on previous stages then it will reorder >> that tail. >> > > OK, I haven't looked at v7 yet, but if I understand correctly it tries > to maintain the ordering as much as possible? Does that actually help? I > mean, the incremental sort patch allows the sorting to happen by pieces, > but here we still need to sort all the data, right? > > Can you give an example demonstrating the benefit? See tst.sql. queries are marked with opt (optimization is on) and noopt. Query 1: select count(*) from btg group by v, r; Query 2: select count(*) from btg group by n, v, r order by n; For both queries it's possible to reorder v and r column, n column has the single distinct value. On my laptop: Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms 2opt vs 2noopt: 5800.307 ms vs 7486.967 ms So, what we see: 1) for query 1 optimization gives 2 times better performance, for query 2 only 30%. if column 'n' will be unique then time for query 1 and 2 should be the same. We could add check for preordered pathkeys in get_cheapest_group_keys_order() and if estimate_num_groups(reordered pathkeys) is close to 1 then we could do not reordering of tail of pathkeys. 2) Planing cost is the same for all queries. So, cost_sort() doesn't take into account even number of columns. > FWIW I think it would be useful to have "development GUC" that would > allow us to enable/disable these options during development, because > that makes experiments much easier. But then remove them before commit. Added, v9, debug_enable_group_by_match_order_by and debug_enable_cheapest_group_by. I also checked compatibility with incremental sort patch, and all works except small merge conflict which could be resolved right before committing. Next, I had a look on cost_incremental_sort() provided by incremental sort patch and, it's a pity, it doesn't solve our problem with the impact of the cost of per-column comparison function and number of its calls. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Вложения
В списке pgsql-hackers по дате отправления: