Re: planner chooses incremental but not the best one

Поиск

Список

Период

Сортировка

От	Richard Guo
Тема	Re: planner chooses incremental but not the best one
Дата	15 декабря 2023 г. 08:58:22
Msg-id	CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: planner chooses incremental but not the best one (Richard Guo <guofenglinux@gmail.com>)
Ответы	Re: planner chooses incremental but not the best one Re: planner chooses incremental but not the best one Re: planner chooses incremental but not the best one
Список	pgsql-hackers

Дерево обсуждения

On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux@gmail.com> wrote:

It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.

In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.

I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached? I believe this aligns more closely with reality.

And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3. As I experimented, this can improve
the estimate for appendrels. For instance,

create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, (100000 * random())::int, random() from generate_series(1,1000000) i;
analyze t;

explain analyze select b from t where c < 0.1 group by b;

-- on master
HashAggregate (cost=18659.28..19598.74 rows=93946 width=4)
(actual time=220.760..234.439 rows=63224 loops=1)

-- on patched
HashAggregate (cost=18659.28..19294.25 rows=63497 width=4)
(actual time=235.161..250.023 rows=63224 loops=1)

With the patch the estimate for the number of distinct 'b' values is
more accurate.

BTW, this patch does not change any existing regression test results. I
attempted to devise a regression test that shows how this change can
improve query plans, but failed. Should I try harder to find such a
test case?

Thanks
Richard

Вложения

v1-0001-Adjust-tuples-estimate-for-appendrel.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: planner chooses incremental but not the best one

Вложения