Re: Disk-based hash aggregate's cost model
От | Jeff Davis |
---|---|
Тема | Re: Disk-based hash aggregate's cost model |
Дата | |
Msg-id | 5278bb673fde719abc5b1079c0c316172a2ca77c.camel@j-davis.com обсуждение исходный текст |
Ответ на | Re: Disk-based hash aggregate's cost model (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: Disk-based hash aggregate's cost model
|
Список | pgsql-hackers |
On Tue, 2020-09-01 at 11:19 +0200, Tomas Vondra wrote: > Why? I don't think we need to change costing of in-memory HashAgg. My > assumption was we'd only tweak startup_cost for cases with spilling > by > adding something like (cpu_operator_cost * npartitions * ntuples). The code above (the in-memory case) has a clause: startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples; which seems to account only for the hash calculation, because it's multiplying by the number of grouping columns. Your calculation would also use cpu_operator_cost, but just for the lookup. I'm OK with that, but it's a little inconsistent to only count it for the tuples that spill to disk. But why multiply by the number of partitions? Wouldn't it be the depth? A wide fanout will not increase the number of lookups. > FWIW I suspect some of this difference may be due to logical vs. > physical I/O. iosnoop only tracks physical I/O sent to the device, > but > maybe we do much more logical I/O and it simply does not expire from > page cache for the sort. It might behave differently for larger data > set, longer query, ... That would suggest something like a penalty for HashAgg for being a worse IO pattern. Or do you have another suggestion? > I don't know. I certainly understand the desire not to change things > this late. OTOH I'm worried that we'll end up receiving a lot of poor > plans post release. I was reacting mostly to changing the cost of Sort. Do you think changes to Sort are required or did I misunderstand? Regards, Jeff Davis
В списке pgsql-hackers по дате отправления: