Re: a wrong index choose when statistics is out of date

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: a wrong index choose when statistics is out of date
Дата
Msg-id 87il1y9ut5.fsf@163.com
обсуждение исходный текст
Ответ на Re: a wrong index choose when statistics is out of date  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Список pgsql-hackers
Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes:

> On 5/3/2024 19:56, Andy Fan wrote:
>> I think it is OK for a design review, for the implementaion side, the
>> known issue includes:
>> 1. Support grap such infromation from its parent for partitioned table
>> if the child doesn't have such information.
>> 2. builtin document and testing.
>> Any feedback is welcome.
> Thanks for your efforts.
> I was confused when you showed the problem connected to clauses like
> "Var op Const" and "Var op Param".

hmm, then what is the soluation in your mind when you say the "ticky" in
[1]?  I am thinking we have some communication gap here.

> As far as I know, the estimation logic of such clauses uses MCV and
> number-distinct statistics. So, being out of MCV values, it becomes
> totally insensitive to any internal skew in data and any data outside
> the statistics boundaries.
> Having studied the example you provided with the patch, I think it is
> not a correct example:
> Difference between var_eq_const and var_eq_non_const quite obvious:

The response should be same as what I did in [2], let's see if we can
make the gap between us smaller.

> In the second routine, you don't have information about the const value
> and can't use MCV for estimation. Also, you can't exclude MCV values
> from the estimation. And it is just luck that you've got the right
> answer. I think if you increased the weight of the unknown part, you
> would get a bad result, too.

> I would like to ask David why the var_eq_const estimator doesn't have an
> option for estimation with a histogram. Having that would relieve a
> problem with skewed data. Detecting the situation with incoming const
> that is out of the covered area would allow us to fall back to ndistinct
> estimation or something else. At least, histogram usage can be
> restricted by the reltuples value and ratio between the total number of
> MCV values and the total number of distinct values in the table.

I think an example which show your algorithm is better would be pretty
helpful for communication. 

[1] https://www.postgresql.org/message-id/15381eea-cbc3-4087-9d90-ab752292bd54%40postgrespro.ru
[2] https://www.postgresql.org/message-id/87msra9vgo.fsf%40163.com
-- 
Best Regards
Andy Fan




В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: a wrong index choose when statistics is out of date
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: remaining sql/json patches