Re: An improvement on parallel DISTINCT
| От | Richard Guo |
|---|---|
| Тема | Re: An improvement on parallel DISTINCT |
| Дата | |
| Msg-id | CAMbWs4_SuhS7VGz47xcnLf9LbsC7-+JfS0w70OD5f2CVkn1g5A@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: An improvement on parallel DISTINCT (David Rowley <dgrowleyml@gmail.com>) |
| Ответы |
Re: An improvement on parallel DISTINCT
|
| Список | pgsql-hackers |
On Fri, Feb 2, 2024 at 11:26 AM David Rowley <dgrowleyml@gmail.com> wrote:
In light of this, do you still think it's worthwhile making this change?
For me, I think all it's going to result in is extra planner work
without any performance gains.
Hmm, with the query below, I can see that the new plan is cheaper than
the old plan, and the cost difference exceeds STD_FUZZ_FACTOR.
create table t (a int, b int);
insert into t select i%100000, i from generate_series(1,10000000)i;
analyze t;
-- on master
explain (costs on) select distinct a from t order by a limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=120188.50..120188.51 rows=1 width=4)
-> Sort (cost=120188.50..120436.95 rows=99379 width=4)
Sort Key: a
-> HashAggregate (cost=118697.82..119691.61 rows=99379 width=4)
Group Key: a
-> Gather (cost=97331.33..118200.92 rows=198758 width=4)
Workers Planned: 2
-> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4)
Group Key: a
-> Parallel Seq Scan on t (cost=0.00..85914.67 rows=4166667 width=4)
(10 rows)
-- on patched
explain (costs on) select distinct a from t order by a limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=106573.93..106574.17 rows=1 width=4)
-> Unique (cost=106573.93..130260.88 rows=99379 width=4)
-> Gather Merge (cost=106573.93..129763.98 rows=198758 width=4)
Workers Planned: 2
-> Sort (cost=105573.91..105822.35 rows=99379 width=4)
Sort Key: a
-> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4)
Group Key: a
-> Parallel Seq Scan on t (cost=0.00..85914.67 rows=4166667 width=4)
(9 rows)
It seems that including a LIMIT clause can potentially favor the new
plan.
Thanks
Richard
the old plan, and the cost difference exceeds STD_FUZZ_FACTOR.
create table t (a int, b int);
insert into t select i%100000, i from generate_series(1,10000000)i;
analyze t;
-- on master
explain (costs on) select distinct a from t order by a limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=120188.50..120188.51 rows=1 width=4)
-> Sort (cost=120188.50..120436.95 rows=99379 width=4)
Sort Key: a
-> HashAggregate (cost=118697.82..119691.61 rows=99379 width=4)
Group Key: a
-> Gather (cost=97331.33..118200.92 rows=198758 width=4)
Workers Planned: 2
-> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4)
Group Key: a
-> Parallel Seq Scan on t (cost=0.00..85914.67 rows=4166667 width=4)
(10 rows)
-- on patched
explain (costs on) select distinct a from t order by a limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=106573.93..106574.17 rows=1 width=4)
-> Unique (cost=106573.93..130260.88 rows=99379 width=4)
-> Gather Merge (cost=106573.93..129763.98 rows=198758 width=4)
Workers Planned: 2
-> Sort (cost=105573.91..105822.35 rows=99379 width=4)
Sort Key: a
-> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4)
Group Key: a
-> Parallel Seq Scan on t (cost=0.00..85914.67 rows=4166667 width=4)
(9 rows)
It seems that including a LIMIT clause can potentially favor the new
plan.
Thanks
Richard
В списке pgsql-hackers по дате отправления: