PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching
От | Tomas Vondra |
---|---|
Тема | PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching |
Дата | |
Msg-id | 5483C4AC.5060107@fuzzy.cz обсуждение исходный текст |
Ответы |
Re: PATCH: hashjoin - gracefully increasing
NTUP_PER_BUCKET instead of batching
Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching |
Список | pgsql-hackers |
Hi, back when we were discussing the hashjoin patches (now committed), Robert proposed that maybe it'd be a good idea to sometimes increase the number of tuples per bucket instead of batching. That is, while initially sizing the hash table - if the hash table with enough buckets to satisfy NTUP_PER_BUCKET load factor does not fit into work_mem, try a bit higher load factor before starting to batch. Attached patch is an initial attempt to implement this - it's a bit rough on the edges, but hopefully enough to judge the benefit of this. The default load factor is 1. The patch tries to degrade this to 2, 4 or 8 in attempt to fit the hash table into work mem. If it doesn't work, it starts batching with the default load factor. If the batching is required while the hashjoin is running, the load factor is switched back to the default one (if we're batching, there's no point in keeping the slower hash table). The patch also modifies explain output, to show the load factor. The testing I've done so far are rather disappointing, though. create table a as select i from generate_series(1,1000000) s(i); create table b as select i from generate_series(1,1000000) s(i); analyze a; analyze b; select a.i, b.i from a join b on (a.i = b.i); work_mem batches tuples per bucket duration ----------------------------------------------------- 64 MB 1 1 585 ms 46 MB 1 2 639 ms 43 MB 1 4 794 ms 40 MB 1 8 1075 ms 39 MB 2 1 623 ms So, even increasing the load factor to 2 is slower than batching. Of course, with other examples the results may be different. For example with a much larger outer table: create table a as select mod(i,1000000) i from generate_series(1,10000000) s(i); analyze a; work_mem batches tuples per bucket duration ----------------------------------------------------- 64 MB 1 1 3904 ms 46 MB 1 2 4692 ms 43 MB 1 4 6499 ms 40 MB 1 8 9264 ms 39 MB 2 1 4054 ms Same results. Of course, for "huge" outer tables it will make a difference. For example on a ~8GB outer table (on a machine with 8GB of RAM), the batching causes enough I/O to make it slower than ntup=2 (50s vs. 80s, for example). I'm not sure what's the best way forward - clearly, for cases that fit into RAM (temp files into page cache), batching is faster. For tables large enough to cause a lot of I/O, it may make a difference - but I'm not sure how to identify these cases. So ISTM implementing this requires a reliable way to identify which case we're dealing with - if the outer table is large enough (prefer increasing load factor) or not (prefer batching). Until then keeping the current simple/predictible approach is probably better. Of course, this also depends on additional variables (e.g. is the system memory-stressed?). regards Tomas
Вложения
В списке pgsql-hackers по дате отправления: