Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Дата	21 ноября 2019 г. 04:45:03
Msg-id	CA+hUKGJVgKknBG69fRyj_NRTjVk5mJqcZ7--uNS2C373Ry7KRg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash (Thomas Munro <thomas.munro@gmail.com>)
Ответы	Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Список	pgsql-bugs

Дерево обсуждения

On Wed, Nov 20, 2019 at 1:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> I'm planning to push something like this late this week (though I
> think the constant might be better as 0 since it doesn't matter much
> given the definition, and I need a back-patchable version that is open
> coded or I need to back-patch hash_combine), unless someone has a
> better idea.

Scratch that plan, it turned out to be a terrible idea.  Although it
solves the batch problem, it's much worse than I expected at bucket
distribution, so the performance is bad.  I think something like
that might work given better bit mixing, but that sounds expensive (or
maybe I'm just being really stupid here).  Anyway, back to the drawing
board.  I'm now thinking the bit stealing scheme (ie take bits from
the other end of the hash, and just let them start to overlap with
bucket bits if you have to when nbatch increases) is a better plan,
and it is so much easier to reason about, and still in the realm of a
handful of instructions.  Here are some numbers[1] showing the number
of non-empty buckets ("pop") and chain length distribution for
non-empty buckets, given 7 billion bigints as input.

method | nbatch | nbucket |    pop |  min |  max |  avg | stddev
-------+--------+---------+--------+------+------+------+--------
rehash |   4096 | 1048576 |    256 | 6212 | 7001 | 6674 |    110
steal  |   4096 | 1048576 | 659574 |    1 |   17 |  2.5 |    1.5
steal  |   4096 | 4194304 | 659574 |    1 |   17 |  2.5 |    1.5
steal  |   8192 | 4194304 | 329685 |    1 |   17 |  2.5 |    1.5

The rehash numbers are obviously catastrophic in this case, and that's
before we've even run out of hash bits.  The steal numbers show that
you just waste memory on empty buckets when hash bits get thin on the
ground (the last two lines in the table).  We could perhaps consider
automatically lowering nbucket in this case too, since you can't
really have more than 2^32 virtual buckets, so pretending you can just
wastes array memory.

So here's a draft steal-the-bucket-bits patch.  Yeah,
reverse_bits_u32() may be in the wrong header.  But is it too slow?
On my desktop I can call ExecGetBucketAndBatch() 353 million times per
second (~2.8ns), and unpatched gets me 656 million/sec (~1.5ns)
(though that includes a function call, and when Hash does it it's
inlined), but it's only slower when nbatch > 1 due to the condition.
To put that number into persepective, I can hash 10 million single-int
tuples from a prewarmed seq scan in 2.5s without batching or
parallelism, so that's 250ns per tuple.  This'd be +0.4% of that, and
I do see it in a few more samples with my profiler, but it's still
basically nothing, and lost in the noise with other noisy partitioning
overheads like IO.  Thoughts?

[1] Working: https://gist.github.com/macdice/b7eb7015846b8384972c2c7cfd6d2c22

Вложения

0001-Improve-batch-number-algorithm-for-large-hash-joins.patch

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

Вложения