[HACKERS] convert EXSITS to inner join gotcha and bug
От | Teodor Sigaev |
---|---|
Тема | [HACKERS] convert EXSITS to inner join gotcha and bug |
Дата | |
Msg-id | f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru обсуждение исходный текст |
Ответы |
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
(Teodor Sigaev <teodor@sigaev.ru>)
|
Список | pgsql-hackers |
Hi! Seems, there two issues: 1) Sometime conditions which for a first glance could be pushed down to scan are leaved as join quals. And it could be a ~30 times performance loss. 2) Number of query result depend on enabe_seqscan variable. The query explain analyze SELECT * FROM t1 INNER JOIN t2 ON ( EXISTS ( SELECT true FROM t3 WHERE t3.id1 = t1.id AND t3.id2 = t2.id ) ) WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd' ; produces following plan: Nested Loop (cost=6.42..1928.71 rows=1 width=99) (actual time=71.415..148.922 rows=162 loops=1) Join Filter: (t3.id1 = t1.id) Rows Removed by Join Filter: 70368 -> Index Only Scan using t1i2 on t1 (cost=0.28..8.30 rows=1 width=66) (actual time=0.100..0.103 rows=1 loops=1) Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text) Heap Fetches: 1 -> Hash Join (cost=6.14..1918.37 rows=163 width=66) (actual time=0.370..120.971 rows=70530 loops=1) (1) Hash Cond: (t3.id2 = t2.id) (2) -> Seq Scan on t3 (cost=0.00..1576.30 rows=70530 width=66) (actual time=0.017..27.424 rows=70530 loops=1) -> Hash (cost=3.84..3.84 rows=184 width=33) (actual time=0.273..0.273 rows=184 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 20kB -> Seq Scan on t2 (cost=0.00..3.84 rows=184 width=33) (actual time=0.017..0.105 rows=184 loops=1) Planning time: 7.326 ms Execution time: 149.115 ms Condition (1) is not pushed to scan (2) which seemsly could be safely moved. With seqscan = off condition is not pushed too but query returns only one row instead of 162. Scan on t3 returns ~70000 rows but only ~150 rows are really needed. I didn't found a combination of GUCs enable_* to push down that and it seems to me there is reason for that which I don't see or support is somehow missed. If pair of (t3.id1, t3.id2) is unique (see dump, there is a unique index on them) the query could be directly rewrited to inner join and its plan is: Nested Loop (cost=9.70..299.96 rows=25 width=66) (actual time=0.376..5.232 rows=162 loops=1) -> Nested Loop (cost=9.43..292.77 rows=25 width=99) (actual time=0.316..0.645 rows=162 loops=1) -> Index Only Scan using t1i2 on t1 (cost=0.28..8.30 rows=1 width=66) (actual time=0.047..0.050 rows=1 loops=1) Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text) Heap Fetches: 1 -> Bitmap Heap Scan on t3 (cost=9.15..283.53 rows=94 width=66) (actual time=0.257..0.426 rows=162 loops=1) Recheck Cond: (id1 = t1.id) Heap Blocks: exact=3 -> Bitmap Index Scan on t3i1 (cost=0.00..9.12 rows=94 width=0) (actual time=0.186..0.186 rows=162 loops=1) Index Cond: (id1 = t1.id) -> Index Only Scan using t2i1 on t2 (cost=0.27..0.29 rows=1 width=33) (actual time=0.024..0.024 rows=1 loops=162) Index Cond: (id = t3.id2) Heap Fetches: 162 Planning time: 5.532 ms Execution time: 5.457 ms Second plan is ~30 times faster. But with turned off sequentual scan the first query is not work correctly, which points to some bug in planner, I suppose. Both 9.6 and 10devel are affected to addiction of query result on seqscan variable. Dump to reproduce (subset of real data but obfucated), queries are in attachment http://sigaev.ru/misc/exists_to_nested.sql.gz -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Kang YuzheДата:
Сообщение: Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...