BUG #17207: Bad cost estimate of Merge Join despite correct row estimate
От | PG Bug reporting form |
---|---|
Тема | BUG #17207: Bad cost estimate of Merge Join despite correct row estimate |
Дата | |
Msg-id | 17207-5265aefa79e333b4@postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #17207: Bad cost estimate of Merge Join despite correct row estimate
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 17207 Logged by: Simon Perepelitsa Email address: sema@sema.in PostgreSQL version: 13.4 Operating system: macOS 10.15.7 Description: Setup create table test_users (id serial primary key); create table test_user_sessions (id serial primary key, user_id int not null references test_users(id)); insert into test_users (id) select generate_series(1, 500000) as id; insert into test_user_sessions (user_id) values (1), (1), (500000); vacuum verbose analyze test_users, test_user_sessions; Query explain analyze select test_user_sessions.id from test_user_sessions join test_users on user_id = test_users.id; Merge Join (cost=1.49..1.55 rows=3 width=4) (actual time=0.015..71.034 rows=3 loops=1) Merge Cond: (test_users.id = test_user_sessions.user_id) -> Index Only Scan using test_users_pkey on test_users (cost=0.42..12996.42 rows=500000 width=4) (actual time=0.005..40.040 rows=500000 loops=1) Heap Fetches: 0 -> Sort (cost=1.05..1.06 rows=3 width=8) (actual time=0.009..0.010 rows=3 loops=1) Sort Key: test_user_sessions.user_id Sort Method: quicksort Memory: 25kB -> Seq Scan on test_user_sessions (cost=0.00..1.03 rows=3 width=8) (actual time=0.004..0.005 rows=3 loops=1) Planning Time: 0.106 ms Execution Time: 71.061 ms After set enable_mergejoin = false Nested Loop (cost=0.42..14.35 rows=3 width=4) (actual time=0.019..0.025 rows=3 loops=1) -> Seq Scan on test_user_sessions (cost=0.00..1.03 rows=3 width=8) (actual time=0.006..0.006 rows=3 loops=1) -> Index Only Scan using test_users_pkey on test_users (cost=0.42..4.44 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=3) Index Cond: (id = test_user_sessions.user_id) Heap Fetches: 0 Planning Time: 0.078 ms Execution Time: 0.040 ms As you can see, Merge Join adds just 0.5 cost on top of Seq Scan ignoring the high cost of full index scan (0.42..12996.42). If explicitly disabled, Nested Loop is obviously a much better join plan for such a small table (3 rows). While it is possible for Merge Join to also finish quickly - if user_id are all low numbers - I'm not sure if that's a realistic expectation for the default plan. I also tried rewriting it as a semi-join with exists/in, but the query plans were exactly the same. Not sure why, because in some of my other queries this makes the planner use more optimized "Semi-Join" instructions (e.g. Nested Loop Semi-Join). explain analyze select test_user_sessions.id from test_user_sessions where exists (select 1 from test_users where user_id = test_users.id);
В списке pgsql-bugs по дате отправления: