Re: hashagg slowdown due to spill changes
От | Andres Freund |
---|---|
Тема | Re: hashagg slowdown due to spill changes |
Дата | |
Msg-id | 20200612213715.op4ye4q7gktqvpuo@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: hashagg slowdown due to spill changes (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: hashagg slowdown due to spill changes
Re: hashagg slowdown due to spill changes |
Список | pgsql-hackers |
Hi, On 2020-06-11 11:14:02 -0700, Jeff Davis wrote: > On Thu, 2020-06-11 at 10:45 -0700, Andres Freund wrote: > > Did you run any performance tests? > > Yes, I reproduced your ~12% regression from V12, and this patch nearly > eliminated it for me. I spent a fair bit of time looking at the difference. Jeff had let me know on chat that he was still seeing some difference, but couldn't quite figure out where that was. Trying it out myself, I observed that the patch helped, but not that much. After a bit I found one major reason for why: LookupTupleHashEntryHash() assigned the hash to pointer provided by the caller's before doing the insertion. That ended up causing a pipeline stall (I assume it's store forwarding, but not sure). Moving the assignment to the caller variable to after the insertion got rid of that. It got within 3-4% after that change. I did a number of small microoptimizations that each helped, but didn't get quite get to the level of 12. Finally I figured out that that's due to an issue outside of nodeAgg.c itself: commit 4cad2534da6d17067d98cf04be2dfc1bda8f2cd0 Author: Tomas Vondra <tomas.vondra@postgresql.org> Date: 2020-05-31 14:43:13 +0200 Use CP_SMALL_TLIST for hash aggregate Due to this change we end up with an additional projection in queries like this: postgres[212666][1]=# \d fewgroups_many_rows Table "public.fewgroups_many_rows" ┌────────┬─────────┬───────────┬──────────┬─────────┐ │ Column │ Type │ Collation │ Nullable │ Default │ ├────────┼─────────┼───────────┼──────────┼─────────┤ │ cat │ integer │ │ not null │ │ │ val │ integer │ │ not null │ │ └────────┴─────────┴───────────┴──────────┴─────────┘ postgres[212666][1]=# explain SELECT cat, count(*) FROM fewgroups_many_rows GROUP BY 1; ┌───────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────┤ │ HashAggregate (cost=1942478.48..1942478.53 rows=5 width=12) │ │ Group Key: cat │ │ -> Seq Scan on fewgroups_many_rows (cost=0.00..1442478.32 rows=100000032 width=4) │ └───────────────────────────────────────────────────────────────────────────────────────┘ (3 rows) as 'val' is "projected away".. After neutering the tlist change, Jeff's patch and my changes to it yield performance *above* v12. I don't see why it's ok to force an additional projection in the very common case of hashaggs over a few rows. So I think we need to rethink 4cad2534da6. Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: