Re: [HACKERS] <> join selectivity estimate question
От | Tom Lane |
---|---|
Тема | Re: [HACKERS] <> join selectivity estimate question |
Дата | |
Msg-id | 14543.1489769980@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] <> join selectivity estimate question (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [HACKERS] <> join selectivity estimate question
|
Список | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > The relevant code is in neqsel(). It estimates the fraction of rows > that will be equal, and then does 1 - that number. Evidently, the > query planner thinks that l1.l_suppkey = l2.l_suppkey would almost > always be true, and therefore l1.l_suppkey <> l2.l_suppkey will almost > always be false. I think the presumed selectivity of l1.l_suppkey = > l2.l_suppkey is being computed by var_eq_non_const(), but I'm a little > puzzled by that function is managing to produce a selectivity estimate > of, essentially, 1. No, I believe it's going through neqjoinsel and thence to eqjoinsel_semi. This query will have been flattened into a semijoin. I can reproduce a similarly bad estimate in the regression database: regression=# explain select * from tenk1 a where exists(select * from tenk1 b where a.thousand = b.thousand and a.twothousand<> b.twothousand); QUERY PLAN -------------------------------------------------------------------------Hash Semi Join (cost=583.00..1067.25 rows=1 width=244) Hash Cond: (a.thousand = b.thousand) Join Filter: (a.twothousand <> b.twothousand) -> Seq Scan on tenk1 a (cost=0.00..458.00 rows=10000 width=244) -> Hash (cost=458.00..458.00 rows=10000 width=8) -> Seq Scan on tenk1b (cost=0.00..458.00 rows=10000 width=8) (6 rows) The problem here appears to be that we don't have any MCV list for the "twothousand" column (because it has a perfectly flat distribution), and the heuristic that eqjoinsel_semi is using for the no-MCVs case is falling down badly. regards, tom lane
В списке pgsql-hackers по дате отправления: