Re: DBT-3 with SF=20 got failed
От | Tomas Vondra |
---|---|
Тема | Re: DBT-3 with SF=20 got failed |
Дата | |
Msg-id | 55D53673.7000106@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: DBT-3 with SF=20 got failed (Kohei KaiGai <kaigai@kaigai.gr.jp>) |
Список | pgsql-hackers |
Hi, On 08/19/2015 01:55 PM, Kohei KaiGai wrote: > Merge Join (cost=25374644.08..1160509591.61 rows=60521928028 > width=24) (actual time=138347.979..491889.343 rows=776157676 loops=1) > Merge Cond: (ws1.ws_order_number = ws2.ws_order_number) > Join Filter: (ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) > Rows Removed by Join Filter: 127853313 > -> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16) > (actual time=73252.300..79017.420 rows=72001237 loops=1) > Sort Key: ws1.ws_order_number > Sort Method: quicksort Memory: 7083296kB > -> Seq Scan on web_sales ws1 (cost=0.00..3290612.48 > rows=72001248 width=16) (actual time=0.023..39951.201 rows=72001237 > loops=1) > -> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16) > (actual time=65095.655..128885.811 rows=904010978 loops=1) > Sort Key: ws2.ws_order_number > Sort Method: quicksort Memory: 7083296kB > -> Seq Scan on web_sales ws2 (cost=0.00..3290612.48 > rows=72001248 width=16) (actual time=0.014..31046.888 rows=72001237 > loops=1) > Planning time: 0.232 ms > Execution time: 530176.521 ms > (14 rows) > > > So, even if we allows nodeHash.c to allocate hash buckets larger than > 1GB, its initial size may be determined carefully. > Probably, 1GB is a good starting point even if expanded later. I'm not sure I understand what is the problem here? Could you elaborate? The initial size of the hash table is determined using the estimate, and if we overestimate it will create more buckets (i.e. consuming more memory) and/or start batching (which might be unnecessary). But I don't really see any "more careful" way to do this, without penalizing the cases where the estimate is actually correct - e.g. by starting with much smaller buckets (and then resizing the hash table, which is not free). Or by starting without batching, betting that we won't actually need it. I think it'll be very difficult to get those working without causing real trouble to cases where we actually do have good estimates (and those are vast majority of queries). But both of those are features, and we're dealing with a bug fix here. kind regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: