Re: hashagg slowdown due to spill changes
От | Jeff Davis |
---|---|
Тема | Re: hashagg slowdown due to spill changes |
Дата | |
Msg-id | 07d6cae97a18c603bed9cf609d16448eb2b438a5.camel@j-davis.com обсуждение исходный текст |
Ответ на | hashagg slowdown due to spill changes (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: hashagg slowdown due to spill changes
|
Список | pgsql-hackers |
On Fri, 2020-06-05 at 21:11 -0700, Andres Freund wrote: > Before there was basically one call from nodeAgg.c to execGrouping.c > for > each tuple and hash table. Now it's a lot more complicated: > 1) nodeAgg.c: prepare_hash_slot() > 2) execGrouping.c: TupleHashTableHash() > 3) nodeAgg.c: lookup_hash_entry() > 4) execGrouping.c: LookupTupleHashEntryHash() The reason that I did it that way was to be able to store the hash along with the saved tuple (similar to what HashJoin does), which avoids recalculation. That could be a nice savings for some cases, like when work_mem is small but the data still fits in system memory, which I expect to be fairly common. But based on your numbers, it might be a bad trade-off overall. > Why isn't the flow more like this: > 1) prepare_hash_slot() > 2) if (aggstate->hash_spill_mode) goto 3; else goto 4 > 3) entry = LookupTupleHashEntry(&hash); if (!entry) > hashagg_spill_tuple(); > 4) InsertTupleHashEntry(&hash, &isnew); if (isnew) initialize(entry) I'll work up a patch to refactor this. I'd still like to see if we can preserve the calculate-hash-once behavior somehow. Regards, Jeff Davis
В списке pgsql-hackers по дате отправления: