Re: WIP: bloom filter in Hash Joins with batches
От | Peter Geoghegan |
---|---|
Тема | Re: WIP: bloom filter in Hash Joins with batches |
Дата | |
Msg-id | CAM3SWZQkrQZTKvXkcGsBDginH+KODsuPXAhqhOW5zCKZBTtCTQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: WIP: bloom filter in Hash Joins with batches (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: WIP: bloom filter in Hash Joins with batches
Re: WIP: bloom filter in Hash Joins with batches |
Список | pgsql-hackers |
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > So, this seems to bring reasonable speedup, as long as the selectivity is > below 50%, and the data set is sufficiently large. What about semijoins? Apparently they can use bloom filters particularly effectively. Have you considered them as a special case? Also, have you considered Hash join conditions with multiple attributes as a special case? I'm thinking of cases like this: regression=# set enable_mergejoin = off; SET regression=# explain analyze select * from tenk1 o join tenk2 t on o.twenty = t.twenty and t.hundred = o.hundred; QUERY PLAN ──────────────────────────────────────────────────────────────────────Hash Join (cost=595.00..4103.00 rows=50000 width=488)(actual time=12.086..1026.194 rows=1000000 loops=1) Hash Cond: ((o.twenty = t.twenty) AND (o.hundred = t.hundred)) -> Seq Scanon tenk1 o (cost=0.00..458.00 rows=10000 width=244) (actual time=0.017..4.212 rows=10000 loops=1) -> Hash (cost=445.00..445.00 rows=10000 width=244) (actual time=12.023..12.023 rows=10000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 2824kB -> Seq Scan on tenk2t (cost=0.00..445.00 rows=10000 width=244) (actual time=0.006..3.453 rows=10000 loops=1)Planning time: 0.567 msExecution time: 1116.094 ms (8 rows) (Note that while the optimizer has a slight preference for a merge join in this case, the plan I show here is a bit faster on my machine). -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: