Re: improving GROUP BY estimation

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: improving GROUP BY estimation
Дата
Msg-id E00A6146-D40C-4E4E-92B2-D6F5017F9B19@gmail.com
обсуждение исходный текст
Ответ на Re: improving GROUP BY estimation  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы Re: improving GROUP BY estimation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
> On Mar 3, 2016, at 11:27 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
>
> On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> So yes, each estimator works great for exactly the opposite cases. But notice that typically, the results of the new
formulais much higher than the old one, sometimes by two orders of magnitude (and it shouldn't be difficult to
constructexamples of much higher differences). 
>
> The table also includes the 'average' estimator you propose, but it's rather obvious that the result is always much
closerto the new value, simply because 
>
>    (small number) + (huge number)
>    ------------------------------
>                   2
>
> is always much closer to the huge number. We're usually quite happy when the estimates are within the same order of
magnitude,so whether it's K or K/2 makes pretty much no difference. 
>
> I believe that Mark means geometrical average, i.e. sqrt((small number) * (huge number)).

Yes, that is what I proposed upthread.  I'm not wedded to that, though.
In general, I am with Tomas on this one, believing that his estimate
will be much better than the current estimate.  But I believe the *best*
estimate will be somewhere between his and the current one, and I'm
fishing for any decent way of coming up with a weighted average that
is closer to his than to the current, but not simply equal to his proposal.

The reason I want the formula to be closer to Tomas's than to the
current is that I think that on average, across all tables, across all
databases, in practice it will be closer to the right estimate than the
current formula.  That's just my intuition, and so I can't defend it.
But if my intuition is right, the best formula we can adopt would be one
that is moderated from his by a little bit, and in the direction of the
estimate that the current code generates.

I could easily lose this debate merely for lack of a principled basis
for saying how far toward the current estimate the new estimate should
be adjusted.  The geometric average is one suggestion, but I don't have
a principled argument for it.

Like I said above, I'm fishing for a decent formula here.

Mark Dilger


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: snapshot too old, configured by time
Следующее
От: Robert Haas
Дата:
Сообщение: Re: postgres_fdw vs. force_parallel_mode on ppc