BUG #18466: Wrong row estimate for nested loop
От | PG Bug reporting form |
---|---|
Тема | BUG #18466: Wrong row estimate for nested loop |
Дата | |
Msg-id | 18466-1d296028273322e2@postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #18466: Wrong row estimate for nested loop
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18466 Logged by: Yan Wu Email address: 4wuyan@gmail.com PostgreSQL version: 16.3 Operating system: Debian Linux Description: Hello everyone, I notice the row estimate for a nested loop is wrong if a cte is involved. ## How to reproduce You can reproduce it in a docker container. I originally saw it on AWS Aurora. ```bash docker run --name test-postgres -e POSTGRES_PASSWORD=mysecretpassword -d -p 5432:5432 postgres:16.3 ``` Simple table setup ```sql create table t1(a int); create table t2(b int); create index my_index on t1 using btree (a); insert into t1 select generate_series(1, 100000) from generate_series(1, 3); insert into t2 select generate_series(1, 100) from generate_series(1, 10); analyze t1; analyze t2; /* Optionally make sure nested loop is used /* -- set enable_mergejoin = off; -- set enable_hashjoin = off; ``` The row estimate for the following plan is wrong: 2980 / 200 is not close to 3. ``` postgres=# explain with my_cte as materialized (select b from t2) select * from t1 where t1.a in (select b from my_cte); QUERY PLAN ------------------------------------------------------------------------------- Nested Loop (cost=37.92..2674.77 rows=2980 width=4) CTE my_cte -> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=4) -> HashAggregate (cost=22.50..24.50 rows=200 width=4) Group Key: my_cte.b -> CTE Scan on my_cte (cost=0.00..20.00 rows=1000 width=4) -> Index Only Scan using my_index on t1 (cost=0.42..13.15 rows=3 width=4) Index Cond: (a = my_cte.b) (8 rows) ``` ## Expected output A simple `distinct` will give the same plan, but with the correct row estimate: ``` postgres=# explain with my_cte as materialized (select b from t2) select * from t1 where t1.a in (select distinct b from my_cte); QUERY PLAN ------------------------------------------------------------------------------- Nested Loop (cost=37.92..2674.77 rows=596 width=4) CTE my_cte -> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=4) -> HashAggregate (cost=22.50..24.50 rows=200 width=4) Group Key: my_cte.b -> CTE Scan on my_cte (cost=0.00..20.00 rows=1000 width=4) -> Index Only Scan using my_index on t1 (cost=0.42..13.15 rows=3 width=4) Index Cond: (a = my_cte.b) (8 rows) ``` This is my expected output: 596 / 200 is approximately 3. If you compare the two plans, you can see they are exactly the same plan with the same cost estimate too. The only difference is `rows=2980` and `rows=596` for the Nested Loop node. It seems `rows=1000` instead of `rows=200` is used for the HashAggregate node in the wrong output. This query is simple and trivial. But it's for demo purpose only. The query may be a part of a bigger query, and the wrong row estimate can lead to a bad execution plan for the big query. ## Postgres version 16.3 ## Platform information Latest postgres docker image, which is based on Debian Linux. Originally found the issue on AWS Aurora.
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: BUG #18463: Possible bug in stored procedures with polymorphic OUT parameters
Следующее
От: Bowen ShiДата:
Сообщение: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae