On Wed, 15 May 2024 at 21:07, PG Bug reporting form
<noreply@postgresql.org> wrote:
> Hello everyone, I notice the row estimate for a nested loop is wrong if a
> cte is involved.
The yet-to-be-released PostgreSQL 17 should have some code which
improves this [1].
With your test case on that version, I see:
postgres=# explain analyze with my_cte as materialized (select b from
t2) select *
postgres-# from t1 where t1.a in (select b from my_cte);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=37.92..469.00 rows=296 width=4) (actual
time=0.574..0.893 rows=300 loops=1)
CTE my_cte
-> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=4) (actual
time=0.015..0.079 rows=1000 loops=1)
-> HashAggregate (cost=22.50..23.50 rows=100 width=4) (actual
time=0.394..0.406 rows=100 loops=1)
Group Key: my_cte.b
Batches: 1 Memory Usage: 24kB
-> CTE Scan on my_cte (cost=0.00..20.00 rows=1000 width=4)
(actual time=0.025..0.229 rows=1000 loops=1)
-> Index Only Scan using my_index on t1 (cost=0.42..4.27 rows=3
width=4) (actual time=0.004..0.004 rows=3 loops=100)
Index Cond: (a = my_cte.b)
Heap Fetches: 0
There are no plans to backpatched this improvement into PostgreSQL 16.
It's not really classed as a bug, just something that could have been
improved... which is now is.
David
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f7816aec2