Re: hashagg slowdown due to spill changes
От | Andres Freund |
---|---|
Тема | Re: hashagg slowdown due to spill changes |
Дата | |
Msg-id | 20200608210819.edcfdqykv3ciwqxq@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: hashagg slowdown due to spill changes (Jeff Davis <pgsql@j-davis.com>) |
Список | pgsql-hackers |
Hi, On 2020-06-08 13:41:29 -0700, Jeff Davis wrote: > On Fri, 2020-06-05 at 21:11 -0700, Andres Freund wrote: > > Before there was basically one call from nodeAgg.c to execGrouping.c > > for > > each tuple and hash table. Now it's a lot more complicated: > > 1) nodeAgg.c: prepare_hash_slot() > > 2) execGrouping.c: TupleHashTableHash() > > 3) nodeAgg.c: lookup_hash_entry() > > 4) execGrouping.c: LookupTupleHashEntryHash() > > The reason that I did it that way was to be able to store the hash > along with the saved tuple (similar to what HashJoin does), which > avoids recalculation. That makes sense. But then you can just use a separate call into execGrouping for that purpose. > > Why isn't the flow more like this: > > 1) prepare_hash_slot() > > 2) if (aggstate->hash_spill_mode) goto 3; else goto 4 > > 3) entry = LookupTupleHashEntry(&hash); if (!entry) > > hashagg_spill_tuple(); > > 4) InsertTupleHashEntry(&hash, &isnew); if (isnew) initialize(entry) > > I'll work up a patch to refactor this. I'd still like to see if we can > preserve the calculate-hash-once behavior somehow. Cool! Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: