Re: POC: GROUP BY optimization
| От | Andrey Lepikhov |
|---|---|
| Тема | Re: POC: GROUP BY optimization |
| Дата | |
| Msg-id | 3d30ae10-5c35-63c4-577c-d2e2479e15fd@postgrespro.ru обсуждение исходный текст |
| Ответ на | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
| Ответы |
Re: POC: GROUP BY optimization
|
| Список | pgsql-hackers |
On 20/7/2023 18:46, Tomas Vondra wrote: > On 7/20/23 08:37, Andrey Lepikhov wrote: >> On 3/10/2022 21:56, Tom Lane wrote: >>> Revert "Optimize order of GROUP BY keys". >>> >>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >>> several follow-on fixes. >>> ... >>> Since we're hard up against the release deadline for v15, let's >>> revert these changes for now. We can always try again later. >> >> It may be time to restart the project. As a first step, I rebased the >> patch on the current master. It wasn't trivial because of some latest >> optimizations (a29eab, 1349d27 and 8d83a5d). >> Now, Let's repeat the review and rewrite the current path according to >> the reasons uttered in the revert commit. > > I think the fundamental task is to make the costing more reliable, and > the commit message 443df6e2db points out a couple challenges in this > area. Not sure how feasible it is to address enough of them ... > > 1) procost = 1.0 - I guess we could make this more realistic by doing > some microbenchmarks and tuning the costs for the most expensive cases. > > 2) estimating quicksort comparisons - This relies on ndistinct > estimates, and I'm not sure how much more reliable we can make those. > Probably not much :-( Not sure what to do about this, the only thing I > can think of is to track "reliability" of the estimates and only do the > reordering if we have high confidence in the estimates. That means we'll > miss some optimization opportunities, but it should limit the risk. I read up on the history of this thread. As I see, all the problems mentioned above can be beaten by excluding the new cost model at all. We can sort GROUP BY columns according to the 'ndistinct' value. I see the reason for introducing the cost model in [1]. The main supporting point here is that with this patch, people couldn't optimize the query by themselves, organizing the order of the columns in a more optimal way. But now we have at least the GUC to switch off the behaviour introduced here. Also, some extensions, like the well-known pg_hint_plan, can help with automation. So, how about committing of the undoubted part of the feature and working on the cost model in a new thread? [1] https://www.postgresql.org/message-id/6d1e0cdb-dde3-f62a-43e2-e90bbd9b0f42%402ndquadrant.com -- regards, Andrey Lepikhov Postgres Professional
В списке pgsql-hackers по дате отправления: