BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs
От | PG Bug reporting form |
---|---|
Тема | BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs |
Дата | |
Msg-id | 17545-a0ca4de888953169@postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 17545 Logged by: Roope Salmi Email address: rpsalmi@gmail.com PostgreSQL version: 14.4 Operating system: Ubuntu 22.04 Description: Hi, It appears that selectivity for "IS NOT DISTINCT FROM" clauses regard the proportion of NULLs in the same way as "=", leading to polar opposite row count estimates. In practice this causes nested loop joins to be used when that is not appropriate. Not sure if this is a known issue, but I was unable to find any previous mentions of it. Here I create a table with one column and 1000 rows of NULL. A cartesian product with "WHERE x.a = y.a" correctly estimates that there are around zero matching rows. "x.a IS NOT DISTINCT FROM y.a" incorrectly gives the same estimate, whereas "x.a = y.a OR (x.a IS NULL AND y.a IS NULL)", which should be equivalent, gives the correct 1000000. postgres=# CREATE TABLE test(a INTEGER); CREATE TABLE postgres=# INSERT INTO test(a) SELECT NULL FROM generate_series(1, 1000); INSERT 0 1000 postgres=# ANALYZE test; ANALYZE postgres=# EXPLAIN SELECT FROM test x, test y WHERE x.a = y.a; QUERY PLAN ---------------------------------------------------------------------- Merge Join (cost=127.66..137.67 rows=1 width=0) Merge Cond: (x.a = y.a) -> Sort (cost=63.83..66.33 rows=1000 width=4) Sort Key: x.a -> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4) -> Sort (cost=63.83..66.33 rows=1000 width=4) Sort Key: y.a -> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4) (8 rows) postgres=# EXPLAIN SELECT FROM test x, test y postgres=# WHERE x.a IS NOT DISTINCT FROM y.a; QUERY PLAN ---------------------------------------------------------------------- Nested Loop (cost=0.00..15030.50 rows=1 width=0) Join Filter: (NOT (x.a IS DISTINCT FROM y.a)) -> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4) -> Materialize (cost=0.00..19.00 rows=1000 width=4) -> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4) (5 rows) postgres=# EXPLAIN SELECT FROM test x, test y postgres-# WHERE x.a = y.a OR (x.a IS NULL AND y.a IS NULL); QUERY PLAN ---------------------------------------------------------------------- Nested Loop (cost=0.00..15030.50 rows=1000000 width=0) Join Filter: ((x.a = y.a) OR ((x.a IS NULL) AND (y.a IS NULL))) -> Seq Scan on test x (cost=0.00..14.00 rows=1000 width=4) -> Materialize (cost=0.00..19.00 rows=1000 width=4) -> Seq Scan on test y (cost=0.00..14.00 rows=1000 width=4) (5 rows)
В списке pgsql-bugs по дате отправления: