Re: POC: GROUP BY optimization
От | Gavin Flower |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | 0b3ee377-2f67-f58d-6177-bc2ed4626499@archidevsys.co.nz обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: POC: GROUP BY optimization
|
Список | pgsql-hackers |
On 30/06/18 03:03, Tomas Vondra wrote: > On 06/29/2018 04:51 PM, Teodor Sigaev wrote: >> >>>> I tried to attack the cost_sort() issues and hope on that basis we >>>> can solve problems with 0002 patch and improve incremental sort patch. >>>> >>> >>> OK, will do. Thanks for working on this! >> >> I hope, now we have a better cost_sort(). The obvious way is a try >> all combination of pathkeys in get_cheapest_group_keys_order() and >> choose cheapest one by cost_sort(). > >> But it requires N! operations and potentially could be very >> expensive in case of large number of pathkeys and doesn't solve the >> issue with user-knows-what-he-does pathkeys. > > Not sure. There are N! combinations, but this seems like a good > candidate for backtracking [1]. You don't have to enumerate and > evaluate all N! combinations, just construct one and then abandon > whole classes of combinations as soon as they get more expensive than > the currently best one. That's thanks to additive nature of the > comparison costing, because appending a column to the sort key can > only make it more expensive. My guess is this will make this a non-issue. > > [1] https://en.wikipedia.org/wiki/Backtracking > >> >> We could suggest an order of pathkeys as patch suggests now and if >> cost_sort() estimates cost is less than 80% (arbitrary chosen) cost >> of user-suggested pathkeys then it use our else user pathkeys. >> > > I really despise such arbitrary thresholds. I'd much rather use a more > reliable heuristics by default, even if it gets it wrong in some cases > (which it will, but that's natural). > > regards > Additionally put an upper limit threshold on the number of combinations to check, fairly large by default? If first threshold is exceeded, could consider checking out a few more selected at random from paths not yet checked, to avoid any bias caused by stopping a systematic search. This might prove important when N! is fairly large. Cheers, Gavin
В списке pgsql-hackers по дате отправления: