Re: a wrong index choose when statistics is out of date

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: a wrong index choose when statistics is out of date
Дата
Msg-id 87r0gqcjno.fsf@163.com
обсуждение исходный текст
Ответ на Re: a wrong index choose when statistics is out of date  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: a wrong index choose when statistics is out of date  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
David Rowley <dgrowleyml@gmail.com> writes:

> On Sun, 3 Mar 2024 at 20:08, Andy Fan <zhihuifan1213@163.com> wrote:
>> The issue can be reproduced with the following steps:
>>
>> create table x_events (.., created_at timestamp, a int, b int);
>>
>> create index idx_1 on t(created_at, a);
>> create index idx_2 on t(created_at, b);
>>
>> query:
>> select * from t where create_at = current_timestamp and b = 1;
>>
>> index (created_at, a) rather than (created_at, b) may be chosen for the
>> above query if the statistics think "create_at = current_timestamp" has
>> no rows, then both index are OK, actually it is true just because
>> statistics is out of date.
>
> I don't think there's really anything too special about the fact that
> the created_at column is always increasing. We commonly get 1-row
> estimates after multiplying the selectivities from individual stats.
> Your example just seems like yet another reason that this could
> happen.

You are right about there are more cases which lead this happen. However
this is the only case where the created_at = $1 trick can works, which
was the problem I wanted to resove when I was writing. 

> I've been periodically talking about introducing "risk" as a factor
> that the planner should consider.  I did provide some detail in [1]
> about the design that was in my head at that time.  I'd not previously
> thought that it could also solve this problem, but after reading your
> email, I think it can.

Haha, I remeber you were against "risk factor" before at [1], and at
that time we are talking about the exact same topic as here, and I
proposaled another risk factor. Without an agreement, I did it in my
own internal version and get hurted then, something like I didn't pay
enough attention to Bitmap Index Scan and Index scan. Then I forget the
"risk factor".

>
> I don't think it would be right to fudge the costs in any way, but I
> think the risk factor for IndexPaths could take into account the
> number of unmatched index clauses and increment the risk factor, or
> "certainty_factor" as it is currently in my brain-based design. That
> way add_path() would be more likely to prefer the index that matches
> the most conditions.

This is somehow similar with my proposal at [1]?  What do you think
about the treat 'col op const' as 'col op $1' for the marked column?
This could just resolve a subset of questions in your mind, but the
method looks have a solid reason.

Currently I treat the risk factor as what you did before, but this maybe
another time for me to switch my mind again.

[1] https://www.postgresql.org/message-id/CAApHDvovVWCbeR4v%2BA4Dkwb%3DYS_GuJG9OyCm8jZu%2B%2BcP2xsY_A%40mail.gmail.com
-- 
Best Regards
Andy Fan




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: PostgreSQL Contributors Updates
Следующее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Synchronizing slots from primary to standby