Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
От | Thomas Munro |
---|---|
Тема | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Дата | |
Msg-id | CA+hUKG+2K6aZwjMNfq6i10_1jQmmcPokdYgBvFhJ1CjKYn2Ovw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
|
Список | pgsql-bugs |
On Mon, Nov 11, 2019 at 12:44 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On Sun, Nov 10, 2019 at 02:46:31PM -0800, Andres Freund wrote: > >On 2019-11-10 22:50:17 +0100, Tomas Vondra wrote: > >> On Sun, Nov 10, 2019 at 10:23:52PM +0100, Tomas Vondra wrote: > >> > On Mon, Nov 11, 2019 at 10:08:58AM +1300, Thomas Munro wrote: > >> > Can't we simply compute two hash values, using different seeds - one for > >> > bucket and the other for batch? Of course, that'll be more expensive. > >> > >> Meh, I realized that's pretty much just a different way to get 64-bit > >> hashes (which is what you mentioned). > > > >I'm not sure it's really the same, given practical realities in > >postgres. Right now the "extended" hash function supporting 64 bit hash > >functions is optional. So we couldn't unconditionally rely on it being > >present, even in master, unless we're prepared to declare it as > >required from now on. > > > >So computing two different hash values at the same time, by using a > >different IV and a different combine function, doesn't seem like an > >unreasonable approach. > > True. I was commenting on the theoretical fact that computing two 32-bit > hashes is close to computing a 64-bit hash, but you're right there are > implementation details that may make it more usable in our case. Here is a quick sketch of something like that, for discussion only. I figured that simply mixing the hash value we have with some arbitrary bits afterwards would be just as good as having started with a different IV, which leads to a very simple change without refactoring. From quick experiments with unique keys (generate_series) I seem to get approximately even sized partitions, and correct answers, but I make no claim to strong hash-math-fu and haven't tested on very large inputs. Thoughts?
Вложения
В списке pgsql-bugs по дате отправления: