Re: planner chooses incremental but not the best one

Поиск
Список
Период
Сортировка
От Richard Guo
Тема Re: planner chooses incremental but not the best one
Дата
Msg-id CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: planner chooses incremental but not the best one  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: planner chooses incremental but not the best one  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: planner chooses incremental but not the best one  (Sébastien Lardière <sebastien@lardiere.net>)
Re: planner chooses incremental but not the best one  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Список pgsql-hackers

On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux@gmail.com> wrote:
It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.

In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.

I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached?  I believe this aligns more closely with reality.

And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3.  As I experimented, this can improve
the estimate for appendrels.  For instance,

create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, (100000 * random())::int, random() from generate_series(1,1000000) i;
analyze t;

explain analyze select b from t where c < 0.1 group by b;

-- on master
 HashAggregate  (cost=18659.28..19598.74 rows=93946 width=4)
                (actual time=220.760..234.439 rows=63224 loops=1)

-- on patched
 HashAggregate  (cost=18659.28..19294.25 rows=63497 width=4)
                (actual time=235.161..250.023 rows=63224 loops=1)

With the patch the estimate for the number of distinct 'b' values is
more accurate.

BTW, this patch does not change any existing regression test results.  I
attempted to devise a regression test that shows how this change can
improve query plans, but failed.  Should I try harder to find such a
test case?

Thanks
Richard
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: jian he
Дата:
Сообщение: Re: remaining sql/json patches
Следующее
От: Pavel Borisov
Дата:
Сообщение: Re: Add 64-bit XIDs into PostgreSQL 15