Re: BUG #16759: Estimation of the planner is wrong for hash join
От | Tomas Vondra |
---|---|
Тема | Re: BUG #16759: Estimation of the planner is wrong for hash join |
Дата | |
Msg-id | 18e3e4e6-2816-77fa-84c2-b85448fbe6ce@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: BUG #16759: Estimation of the planner is wrong for hash join (Bertrand Guillaumin <bertrand.guillaumin@gmail.com>) |
Список | pgsql-bugs |
On 12/17/20 6:36 PM, Bertrand Guillaumin wrote: > Hello, > I think I just made a test that shows that even with MCV on both sides > the estimated selectivity can be pretty wrong. > test=# create table test_bug_hash2 as SELECT mod(id,500) as id , case > when id<=500 then 1 else 2 end parent_id, null::text as attrib from > (select generate_series(1,1000) as id) alias0; > SELECT 1000 > test=# update test_bug_hash2 set attrib='BEL' where id=2; > UPDATE 2 > test=# analyze test_bug_hash2; > ANALYZE > test=# explain select * from test_bug_hash2 a, test_bug_hash2 b where > a.parent_id=b.id <http://b.id> and b.attrib='BEL'; > QUERY PLAN > ------------------------------------------------------------------------------ > Hash Join (cost=17.52..37.56 rows=4 width=24) > Hash Cond: (a.parent_id = b.id <http://b.id>) > -> Seq Scan on test_bug_hash2 a (cost=0.00..15.00 rows=1000 width=12) > -> Hash (cost=17.50..17.50 rows=2 width=12) > -> Seq Scan on test_bug_hash2 b (cost=0.00..17.50 rows=2 > width=12) > Filter: (attrib = 'BEL'::text) > > test=# select count(*) from test_bug_hash2 a, test_bug_hash2 b where > a.parent_id=b.id <http://b.id> and b.attrib='BEL'; > count > ------- > 1000 > > I won't copy paste the pg_stats lines but most_common_vals and > most_common_freq have values for all three columns. > > I'm not a programmer but I've looked into the code of the planner a > little bit and it seems you try to estimate the selectivity of a join in > itself without any regards to the filters that can be applied on any > side of the join ( if I understood correctly). > I think that it's ultimately where the problem lies, if the filter is > not too important the selectivity stays more or less the same but with > filters like the one in this query the selectivity of the join can > change a lot so in the end you get estimations which can be totally wrong. > > Right. The problem is that the two columns are correlated, thanks to how you set the attrib value only for id=2, but the join estimation code is oblivious to that. Perhaps the multi-column/extended stats might allow us to improve this, but the code has not been written yet. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-bugs по дате отправления: