Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
От | Andres Freund |
---|---|
Тема | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Дата | |
Msg-id | 20191113184759.gacxzbxk523vupfc@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
|
Список | pgsql-bugs |
Hi, On 2019-11-11 11:46:05 +0100, Tomas Vondra wrote: > On Mon, Nov 11, 2019 at 01:33:41PM +1300, Thomas Munro wrote: > Hmmm, iteresting. Using a hard-coded constant seems a bit weird, > although I don't have any precise argument why it's wrong yet. > > Essentially, the patch does this: > > batchno = hash_combine(0x9e3779b9, hashvalue) & (nbatch - 1); > bucketno = hashvalue & (nbuckets - 1); > > but adding a constant does not 'propagate' any bits from the original > value to the hash. So it seems pretty much equivalent to I was over IM arguing that we ought to also use a different mixing functions when computing the hash. > batchno = hashvalue & (nbatch - 1); > bucketno = hashvalue & (nbuckets - 1); > so we still operate with the same number of bits. It might solve this > particular case, but I doubt it's a good idea in general ... > > I think we will have to actually compute two hashes, to get more than 32 > bits. But yes, that'll be more invasive, and unlikely backpatchable. I think it might also just generally not be acceptable, from a performance POV. Computing the hashes is already a major bottleneck for HJ. Penalizing everybody for these extreme cases doesn't strike me as great. It might be ok to compute a 64bit hash, but I don't see how computing two hashes in parallel wouldn't have significant overhead. I don't really see what the problem is with using just a differently mixed hash. There's not really a requirement for there to be a tremendous amount of independence between the bits used for the bucket, and the bits for the batch. We cannot just reuse exactly the same bits for the bucket that we used for the hash, as otherwise we'd just raise the number of bucket conflicts (as the whole hashtable will only contain tuples with the reduced number of bits). But I don't see why that implies a need for full bit indepence. As long as the mixing guarantees that we use the bits for bucketing differently than the ones for batches, I think we're good. All that need is that each batch is is roughly equally sized, and contains tuples with hashes that roughly equally distributed - you could bitreverse the hash for batch selection, and get that, no? Greetings, Andres Freund
В списке pgsql-bugs по дате отправления: