Optimization idea
От | Vlad Arkhipov |
---|---|
Тема | Optimization idea |
Дата | |
Msg-id | 4BD0161A.4010908@dc.baikal.ru обсуждение исходный текст |
Ответы |
Re: Optimization idea
|
Список | pgsql-performance |
Please do this small optimization if it is possible. It seem that the optimizer have the all information to create a fast plan but it does not do that. create temp table t1 (id bigint, t bigint); insert into t1 values (1, 1); insert into t1 values (2, 2); insert into t1 values (2, 3); insert into t1 values (2, 4); insert into t1 values (3, 5); create temp table t2 (id bigint, t bigint); insert into t2 (id, t) select g, 2 from generate_series(1, 200) g; insert into t2 (id, t) select g, 3 from generate_series(201, 300) g; insert into t2 (id, t) select g, 4 from generate_series(301, 400) g; insert into t2 (id, t) select g, 1 from generate_series(401, 100000) g; insert into t2 (id, t) select g, 5 from generate_series(100001, 100100) g; create index t_idx on t2(t); analyze t1; analyze t2; explain analyze select * from t2 join t1 on t1.t = t2.t where t1.t = 2 explain analyze select * from t2 join t1 on t1.t = t2.t where t1.id = 3 explain analyze select * from t2 where t2.t in (2, 3, 4) These two queries are completely equal and optimizator should know it as I see from the plans: "Hash Join (cost=1.09..2667.09 rows=75000 width=32) (actual time=0.026..100.207 rows=400 loops=1)" " Hash Cond: (t2.t = t1.t)" " -> Seq Scan on t2 (cost=0.00..1541.00 rows=100000 width=16) (actual time=0.007..47.083 rows=100000 loops=1)" " -> Hash (cost=1.05..1.05 rows=3 width=16) (actual time=0.011..0.011 rows=3 loops=1)" " -> Seq Scan on t1 (cost=0.00..1.05 rows=3 width=16) (actual time=0.005..0.008 rows=3 loops=1)" <-- HERE IS THE PROBLEM. IF THE ESTIMATED COUNT = 1 OPTIMIZER BUILDS THE CORRECT FAST PLAN, BUT IF THE ESTIMATION IS GREATER THAN 1 WE HAVE A PROBLEM " Filter: (id = 2)" "Total runtime: 100.417 ms" "Nested Loop (cost=0.00..1024.46 rows=20020 width=32) (actual time=0.030..0.222 rows=100 loops=1)" " -> Seq Scan on t1 (cost=0.00..1.05 rows=1 width=16) (actual time=0.008..0.009 rows=1 loops=1)" " Filter: (id = 3)" " -> Index Scan using t_idx on t2 (cost=0.00..773.16 rows=20020 width=16) (actual time=0.016..0.078 rows=100 loops=1)" " Index Cond: (t2.t = t1.t)" "Total runtime: 0.296 ms" "Bitmap Heap Scan on t2 (cost=16.09..556.80 rows=429 width=16) (actual time=0.067..0.256 rows=400 loops=1)" " Recheck Cond: (t = ANY ('{2,3,4}'::bigint[]))" " -> Bitmap Index Scan on t_idx (cost=0.00..15.98 rows=429 width=0) (actual time=0.056..0.056 rows=400 loops=1)" " Index Cond: (t = ANY ('{2,3,4}'::bigint[]))" "Total runtime: 0.458 ms" An ugly workaround is to add the column t1(t) in the table t2.
В списке pgsql-performance по дате отправления: