Re: POC: GROUP BY optimization
От | Teodor Sigaev |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | 78d94983-0fcd-78ea-f4c0-efd5022e1386@sigaev.ru обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Список | pgsql-hackers |
> OK, I haven't looked at v7 yet, but if I understand correctly it tries > to maintain the ordering as much as possible? Does that actually help? I > mean, the incremental sort patch allows the sorting to happen by pieces, > but here we still need to sort all the data, right? > > Can you give an example demonstrating the benefit? > > SELECT a, SUM(x) FROM > ( > SELECT a, b, COUNT(c) AS x FROM t1 GROUP BY a, b > UNION ALL > SELECT a, b, COUNT(c) AS x FROM t2 GROUP BY a, b > ) foo GROUP BY a; > > and indexes on (a,b) and (b,a) for both relations. The "deduplication" > by pathkeys I suggested would mean we might keep only index (a,b) on t1 > and (b,a) on t2, which means the grouping by "a" can't leverage index > scans on both relations. But if we keep paths for both indexes on each > relation, we can. yes, one of option > Isn't "estimation of cost of comparing function/number of unique values > in column could be not very accurate and so planner could make a wrong > choice" is more an argument against relying on it when doing these > optimizations? > > FWIW it's one of the arguments Tom made in the incremental sort patch, > which relies on it too when computing cost of the incremental sort. I'm > sure it's going to be an obstacle there too. > >> I saw 2 times difference in real-world application. Again, improving >> sort cost estimation is a separate task. > Sure. But we also need to ask the other question, i.e. how many people > would be negatively affected by the optimization. And I admit I don't > know the answer to that, the next example is entirely made up. Hm, seems, the best way here is a improving cost_sort estimation. Will try, but I think that is separated patch > FWIW I think it would be useful to have "development GUC" that would > allow us to enable/disable these options during development, because > that makes experiments much easier. But then remove them before commit. Will do -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: