Re: Hash Join cost estimates
От | Tom Lane |
---|---|
Тема | Re: Hash Join cost estimates |
Дата | |
Msg-id | 14141.1364834786@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Hash Join cost estimates (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Hash Join cost estimates
|
Список | pgsql-hackers |
I wrote: > Perhaps what we should do is charge the hash_qual_cost only for some > small multiple of the number of tuples that we expect will *pass* the > hash quals, which is a number we have to compute anyway. The multiple > would represent the rate of hash-code collisions we expect. I tried the attached quick-hack patch on Stephen's example. With work_mem set to 16MB I get these results: regression=# explain analyze select * from small_table join big_table using (id_short); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=1229.46..74154.49 rows=41176 width=24) (actual time=47.723..1845.869 rows=13731 loops=1) Hash Cond: (big_table.id_short = small_table.id_short) -> Seq Scan on big_table (cost=0.00..61626.71 rows=4272271 width=4) (actual time=0.045..506.212 rows=4272271 loops=1) -> Hash (cost=714.76..714.76 rows=41176 width=24) (actual time=24.944..24.944 rows=41176 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2574kB -> Seq Scan on small_table (cost=0.00..714.76 rows=41176 width=24) (actual time=0.007..11.608 rows=41176 loops=1) Total runtime: 1847.697 ms (7 rows) Forcing the other plan to be chosen, I get regression=# explain analyze select * from small_table join big_table using (id_short); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=131719.10..150327.44 rows=41176 width=24) (actual time=1922.942..2810.095 rows=13731 loops=1) Hash Cond: (small_table.id_short = big_table.id_short) -> Seq Scan on small_table (cost=0.00..714.76 rows=41176 width=24) (actual time=0.012..10.058 rows=41176 loops=1) -> Hash (cost=61626.71..61626.71 rows=4272271 width=4) (actual time=1921.962..1921.962 rows=4272271 loops=1) Buckets: 65536 Batches: 16 Memory Usage: 9412kB -> Seq Scan on big_table (cost=0.00..61626.71 rows=4272271 width=4) (actual time=0.043..702.898 rows=4272271 loops=1) Total runtime: 2820.633 ms (7 rows) So that's at least going in the right direction. I have not thought about how the calculation should be adjusted in the semi/anti join case, nor about how we ought to repurpose the bucket-size-variance calculations for checking whether work_mem will be exceeded. So this is a long way from being committable, but it seems promising. regards, tom lane
Вложения
В списке pgsql-hackers по дате отправления: