Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
От | Thomas Munro |
---|---|
Тема | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Дата | |
Msg-id | CA+hUKGJVgKknBG69fRyj_NRTjVk5mJqcZ7--uNS2C373Ry7KRg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
|
Список | pgsql-bugs |
On Wed, Nov 20, 2019 at 1:05 PM Thomas Munro <thomas.munro@gmail.com> wrote: > I'm planning to push something like this late this week (though I > think the constant might be better as 0 since it doesn't matter much > given the definition, and I need a back-patchable version that is open > coded or I need to back-patch hash_combine), unless someone has a > better idea. Scratch that plan, it turned out to be a terrible idea. Although it solves the batch problem, it's much worse than I expected at bucket distribution, so the performance is bad. I think something like that might work given better bit mixing, but that sounds expensive (or maybe I'm just being really stupid here). Anyway, back to the drawing board. I'm now thinking the bit stealing scheme (ie take bits from the other end of the hash, and just let them start to overlap with bucket bits if you have to when nbatch increases) is a better plan, and it is so much easier to reason about, and still in the realm of a handful of instructions. Here are some numbers[1] showing the number of non-empty buckets ("pop") and chain length distribution for non-empty buckets, given 7 billion bigints as input. method | nbatch | nbucket | pop | min | max | avg | stddev -------+--------+---------+--------+------+------+------+-------- rehash | 4096 | 1048576 | 256 | 6212 | 7001 | 6674 | 110 steal | 4096 | 1048576 | 659574 | 1 | 17 | 2.5 | 1.5 steal | 4096 | 4194304 | 659574 | 1 | 17 | 2.5 | 1.5 steal | 8192 | 4194304 | 329685 | 1 | 17 | 2.5 | 1.5 The rehash numbers are obviously catastrophic in this case, and that's before we've even run out of hash bits. The steal numbers show that you just waste memory on empty buckets when hash bits get thin on the ground (the last two lines in the table). We could perhaps consider automatically lowering nbucket in this case too, since you can't really have more than 2^32 virtual buckets, so pretending you can just wastes array memory. So here's a draft steal-the-bucket-bits patch. Yeah, reverse_bits_u32() may be in the wrong header. But is it too slow? On my desktop I can call ExecGetBucketAndBatch() 353 million times per second (~2.8ns), and unpatched gets me 656 million/sec (~1.5ns) (though that includes a function call, and when Hash does it it's inlined), but it's only slower when nbatch > 1 due to the condition. To put that number into persepective, I can hash 10 million single-int tuples from a prewarmed seq scan in 2.5s without batching or parallelism, so that's 250ns per tuple. This'd be +0.4% of that, and I do see it in a few more samples with my profiler, but it's still basically nothing, and lost in the noise with other noisy partitioning overheads like IO. Thoughts? [1] Working: https://gist.github.com/macdice/b7eb7015846b8384972c2c7cfd6d2c22
Вложения
В списке pgsql-bugs по дате отправления: