Re: a wrong index choose when statistics is out of date

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: a wrong index choose when statistics is out of date
Дата
Msg-id 87a5mf837g.fsf@163.com
обсуждение исходный текст
Ответ на Re: a wrong index choose when statistics is out of date  (Andy Fan <zhihuifan1213@163.com>)
Ответы Re: a wrong index choose when statistics is out of date  (Andy Fan <zhihuifan1213@163.com>)
Список pgsql-hackers
Hello everyone,

> After some more thoughts about the diference of the two ideas, then I
> find we are resolving two different issues, just that in the wrong index
> choose cases, both of them should work generally. 

Here is the formal version for the attribute reloptions direction.

commit 0d842e39275710a544b11033f5eec476147daf06 (HEAD -> force_generic)
Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com>
Date:   Sun Mar 31 11:51:28 2024 +0800

    Add a attopt to disable MCV when estimating for Var = Const
    
    As of current code, when calculating the selectivity for Var = Const,
    planner first checks if the Const is an most common value and if not, it
    takes out all the portions of MCV's selectivity and num of distinct
    value, and treat the selectivity for Const equal for the rest
    n_distinct.
    
    This logic works great when the optimizer statistic is up to date,
    however if the known most common value has taken up most of the
    selectivity at the last run of analyze, and the new most common value in
    reality has not been gathered, the estimation for the new MCV will be
    pretty bad. A common case for this would be created_at = {current_date};
    
    To overcome this issue, we provides a new syntax:
    
    ALTER TABLE tablename ALTER COLUMN created_at SET (force_generic=on);
    
    After this, planner ignores the value of MCV for this column when
    estimating for Var = Const and treating all the values equally.
    
    This would cause some badness if the values for a column are pretty not
    equal which is what MCV is designed for, however this patch just provide
    one more option to user and let user make the decision.

Here is an example about its user case.

create table t(a int, b int, c int) with (autovacuum_enabled=off);
create index on t(a, b);
create index on t(a, c);
create table t2 (id int primary key, a int);
insert into t2 select i , i from generate_series(1, 800)i;

insert into t select floor(random() * 100 + 1)::int, i, i
from generate_series(1, 100000) i;
analyze t,t2;

insert into t
select floor(random() * 10 + 1)::int + 100 , i, i
from generate_series(1, 10000) i;

explain (costs off) select * from t where a = 109 and b = 8;
explain (costs off, analyze)
select * from t join t2 on t.c = t2.id where t.a = 109;

ALTER TABLE t ALTER COLUMN a SET (force_generic=on);

-- We will see some good result now.
explain (costs off) select * from t where a = 109 and b = 8;
explain (costs off, analyze)
select * from t join t2 on t.c = t2.id where t.a = 109;

I will add this to our commitfest application, any feedback is welcome! 

-- 
Best Regards
Andy Fan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Koval
Дата:
Сообщение: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Следующее
От: John Naylor
Дата:
Сообщение: Re: Change GUC hashtable to use simplehash?