Re: bad estimation together with large work_mem generates terrible slow hash joins
От | Tomas Vondra |
---|---|
Тема | Re: bad estimation together with large work_mem generates terrible slow hash joins |
Дата | |
Msg-id | 53B0A03C.3080805@fuzzy.cz обсуждение исходный текст |
Ответ на | Re: bad estimation together with large work_mem generates terrible slow hash joins (Tomas Vondra <tv@fuzzy.cz>) |
Ответы |
Re: bad estimation together with large work_mem generates
terrible slow hash joins
|
Список | pgsql-hackers |
On 26.6.2014 23:48, Tomas Vondra wrote: > On 26.6.2014 20:43, Tomas Vondra wrote: >> Attached is v2 of the patch, with some cleanups / minor improvements: >> >> * there's a single FIXME, related to counting tuples in the > > Meh, I couldn't resist resolving this FIXME, so attached is v3 of the > patch. This just adds a proper 'batch tuples' counter to the hash table. > > All comments, measurements on different queries etc. welcome. We'll > certainly do a lot of testing, because this was a big issue for us. Attached is v4 of the patch, with a few minor improvements. The only thing worth mentioning is overflow protection, similar to what's done in the ExecChooseHashTableSize() function. Otherwise it's mostly about improving comments. Also attached is a v4 with GUC, making it easier to compare effect of the patch, by simply setting "enable_hashjoin_bucket" to "off" (original behaviour) or "on" (new behaviour). And finally there's an SQL script demonstrating the effect of the patch with various work_mem settings. For example what I see on my desktop is this (averages from 3 runs): ===== SMALL WORK MEM (2MB) ===== no dynamic buckets dynamic buckets query A 5945 ms 5969 ms query B 6080 ms 5943 ms query C 6531 ms 6822 ms query D 6962 ms 6618 ms ===== MEDIUM WORK MEM (16MB) ===== no dynamic buckets dynamic buckets query A 7955 ms 7944 ms query B 9970 ms 7569 ms query C 8643 ms 8560 ms query D 33908 ms 7700 ms ===== LARGE WORK MEM (64MB) ===== no dynamic buckets dynamic buckets query A 10235 ms 10233 ms query B 32229 ms 9318 ms query C 14500 ms 10554 ms query D 213344 ms 9145 ms Where "A" is "exactly estimated" and the other queries suffer by various underestimates. My observations from this are: (1) For small work_mem values it does not really matter, thanks to the caching effects (the whole hash table fits into L2 CPU cache). (2) For medium work_mem values (not really huge, but exceeding CPU caches), the differences are negligible, except for the last query with most severe underestimate. In that case the new behaviour is much faster. (3) For large work_mem values, the speedup is pretty obvious and dependent on the underestimate. The question is why to choose large work_mem values when the smaller values actually perform better. Well, the example tables are not perfectly representative. In case the outer table is much larger and does not fit into RAM that easily (which is the case of large fact tables or joins), the rescans (because of more batches) are more expensive and outweight the caching benefits. Also, the work_mem is shared with other nodes, e.g. aggregates, and decreasing it because of hash joins would hurt them. regards Tomas
Вложения
В списке pgsql-hackers по дате отправления: