Re: a wrong index choose when statistics is out of date

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: a wrong index choose when statistics is out of date
Дата
Msg-id 87bk7sbzaj.fsf@163.com
обсуждение исходный текст
Ответ на Re: a wrong index choose when statistics is out of date  (Andy Fan <zhihuifan1213@163.com>)
Ответы Re: a wrong index choose when statistics is out of date  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Re: a wrong index choose when statistics is out of date  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Hi,

>
>> We should do anything like add column options in the meantime. Those
>> are hard to remove once added.
>
> I will try it very soon.

Attached is a PoC version. and here is the test case.

create table t(a int, b int, c int) with (autovacuum_enabled=off);
create index on t(a, b);
create index on t(a, c);

insert into t select floor(random() * 100 + 1)::int, i, i
from generate_series(1, 100000) i;

analyze t;

insert into t
select floor(random() * 10 + 1)::int + 100 , i, i
from generate_series(1, 1000) i;

-- one of the below queries would choose a wrong index.
-- here is the result from my test.
explain (costs off) select * from t where a = 109 and c = 8;
              QUERY PLAN               
---------------------------------------
 Index Scan using t_a_c_idx on t
   Index Cond: ((a = 109) AND (c = 8))
(2 rows)

explain (costs off) select * from t where a = 109 and b = 8;
           QUERY PLAN            
---------------------------------
 Index Scan using t_a_c_idx on t
   Index Cond: (a = 109)
   Filter: (b = 8)
(3 rows)

Wrong index is chosen for the second case!

-- After applying the new API.

alter table t alter column a set (force_generic=on);

explain (costs off) select * from t where a = 109 and c = 8;
              QUERY PLAN               
---------------------------------------
 Index Scan using t_a_c_idx on t
   Index Cond: ((a = 109) AND (c = 8))
(2 rows)

explain (costs off) select * from t where a = 109 and b = 8;
              QUERY PLAN               
---------------------------------------
 Index Scan using t_a_b_idx on t
   Index Cond: ((a = 109) AND (b = 8))
(2 rows)

Then both cases can choose a correct index.

commit f8cca472479c50ba73479ec387882db43d203522 (HEAD -> shared_detoast_value)
Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com>
Date:   Tue Mar 5 18:27:48 2024 +0800

    Add a "force_generic" attoptions for selfunc.c
    
    Sometime user just care about the recent data and the optimizer
    statistics for such data is not gathered, then some bad decision may
    happen. Before this patch, we have to make the autoanalyze often and
    often, but it is not only expensive but also may be too late.
    
    This patch introduces a new attoptions like this:
    
    ALTER TABLE t ALTER COLUMN col set (force_generic=true);
    
    Then selfunc.c realizes this and ignore the special Const value, then
    average selectivity is chosen. This fall into the weakness of generic
    plan, but this patch doesn't introduce any new weakness and we leave the
    decision to user which could resolve some problem. Also this logic only
    apply to eqsel since the ineqsel have the get_actual_variable_range
    mechanism which is helpful for index choose case at least.

I think it is OK for a design review, for the implementaion side, the
known issue includes:

1. Support grap such infromation from its parent for partitioned table
if the child doesn't have such information.
2. builtin document and testing. 

Any feedback is welcome.

-- 
Best Regards
Andy Fan


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Laurenz Albe
Дата:
Сообщение: Reducing the log spam
Следующее
От: Jakub Wartak
Дата:
Сообщение: Re: index prefetching