Re: improving GROUP BY estimation
От | Mark Dilger |
---|---|
Тема | Re: improving GROUP BY estimation |
Дата | |
Msg-id | E00A6146-D40C-4E4E-92B2-D6F5017F9B19@gmail.com обсуждение исходный текст |
Ответ на | Re: improving GROUP BY estimation (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Ответы |
Re: improving GROUP BY estimation
|
Список | pgsql-hackers |
> On Mar 3, 2016, at 11:27 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > So yes, each estimator works great for exactly the opposite cases. But notice that typically, the results of the new formulais much higher than the old one, sometimes by two orders of magnitude (and it shouldn't be difficult to constructexamples of much higher differences). > > The table also includes the 'average' estimator you propose, but it's rather obvious that the result is always much closerto the new value, simply because > > (small number) + (huge number) > ------------------------------ > 2 > > is always much closer to the huge number. We're usually quite happy when the estimates are within the same order of magnitude,so whether it's K or K/2 makes pretty much no difference. > > I believe that Mark means geometrical average, i.e. sqrt((small number) * (huge number)). Yes, that is what I proposed upthread. I'm not wedded to that, though. In general, I am with Tomas on this one, believing that his estimate will be much better than the current estimate. But I believe the *best* estimate will be somewhere between his and the current one, and I'm fishing for any decent way of coming up with a weighted average that is closer to his than to the current, but not simply equal to his proposal. The reason I want the formula to be closer to Tomas's than to the current is that I think that on average, across all tables, across all databases, in practice it will be closer to the right estimate than the current formula. That's just my intuition, and so I can't defend it. But if my intuition is right, the best formula we can adopt would be one that is moderated from his by a little bit, and in the direction of the estimate that the current code generates. I could easily lose this debate merely for lack of a principled basis for saying how far toward the current estimate the new estimate should be adjusted. The geometric average is one suggestion, but I don't have a principled argument for it. Like I said above, I'm fishing for a decent formula here. Mark Dilger
В списке pgsql-hackers по дате отправления: