Re: POC: GROUP BY optimization
От | Teodor Sigaev |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | 736c5ece-7ac4-241b-4a11-52797691238f@sigaev.ru обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: POC: GROUP BY optimization
|
Список | pgsql-hackers |
> So the costing was fairly trivial, we simply do something like > > comparison_cost = 2.0 * cpu_operator_cost; > > sort_cost = comparison_cost * tuples * LOG2(tuples); > > which essentially ignores that there might be multiple columns, or that > the columns may have sort operator with different costs. Agree. And distribution of keys. > > The question is how reliable the heuristics can be. The current patch > uses just plain ndistinct, but that seems rather unreliable but I don't > have a clear idea how to improve that - we may have MCV for the columns > and perhaps some extended statistics, but I'm not sure how far we can > run with that. v8 already uses another algorithm. > > Essentially what we need to estimate the number of comparisons for each > column, to compute better comparison_cost. Exactly >> Priorization of the user-provided order can be as simple as giving >> that comparison_cost a small handicap. > > I see no point in doing that, and I don't recall a single place in the > planner where we do that. If the user specified ORDER BY, we'll slap an > explicit Sort on top when needed (which acts as the handicap, but in a > clear way). Otherwise we don't do such things - it'd be just plain > confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data > types, ndistinct etc. but unexpectedly different costs). Also, what > would be a good value for the handicap? Again agree. If we have fixed order of columns (ORDER BY) then we should not try to reorder it. Current patch follows that if I didn't a mistake. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: