Re: POC: GROUP BY optimization
От | Andrei Lepikhov |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | d026580a-15a0-4734-9e69-cc6ebb70da1d@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Richard Guo <guofenglinux@gmail.com>) |
Ответы |
Re: POC: GROUP BY optimization
|
Список | pgsql-hackers |
On 22/2/2024 13:35, Richard Guo wrote: > The avg() function on integer argument is commonly used in > aggregates.sql. I don't think this is an issue. See the first test > query in aggregates.sql. Make sense > > it should be parallel to the test cases for utilize the ordering of > > index scan and subquery scan. > > Also, I'm unsure about removing the disabling of the > max_parallel_workers_per_gather parameter. Have you discovered the > domination of the current plan over the partial one? Do the cost > fluctuations across platforms not trigger a parallel plan? > > > The table used for testing contains only 100 tuples, which is the size > of only one page. I don't believe it would trigger any parallel plans, > unless we manually change min_parallel_table_scan_size. I don't intend to argue it, but just for the information, I frequently reduce it to zero, allowing PostgreSQL to make a decision based on costs. It sometimes works much better, because one small table in multi join can disallow an effective parallel plan. > > What's more, I suggest to address here the complaint from [1]. As I > see, > cost difference between Sort and IncrementalSort strategies in that > case > is around 0.5. To make the test more stable I propose to change it a > bit > and add a limit: > SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10; > It makes efficacy of IncrementalSort more obvious difference around 10 > cost points. > > > I don't think that's necessary. With Incremental Sort the final cost > is: > > GroupAggregate (cost=1.66..19.00 rows=100 width=25) > > while with full Sort it is: > > GroupAggregate (cost=16.96..19.46 rows=100 width=25) > > With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path > is cheaper on total cost. Not to say that even if somehow we decide the > two paths are fuzzily the same on total cost, the first path still > dominates because its startup cost is much cheaper. As before, I won't protest here - it needs some computations about how much cost can be added by bulk extension of the relation blocks. If Maxim will answer that it's enough to resolve his issue, why not? -- regards, Andrei Lepikhov Postgres Professional
В списке pgsql-hackers по дате отправления: