Re: POC: GROUP BY optimization
От | Andrey V. Lepikhov |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | dc933c66-41b9-8907-327e-f6361a690654@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: POC: GROUP BY optimization
Re: POC: GROUP BY optimization |
Список | pgsql-hackers |
I keep work on this patch. Here is intermediate results. On 7/22/21 3:58 AM, Tomas Vondra wrote: > in the first loop. Which seems pretty bogus - why would there be just > two groups? When processing the first expression, it's as if there was > one big "prev group" with all the tuples, so why not to just use nGroups > as it is? I think, heapsort code seems very strange. Look into fallback case. It based on an output_tuples value. Maybe we should use nGroups value here, but based on a number of output_tuples? > 1) I looked at the resources mentioned as sources the formulas came > from, but I've been unable to really match the algorithm to them. The > quicksort paper is particularly "dense", the notation seems to be very > different, and none of the theorems seem like an obvious fit. Would be > good to make the relationship clearer in comments etc. Fixed (See attachment). > 3) I'm getting a bit skeptical about the various magic coefficients that > are meant to model higher costs with non-uniform distribution. But > consider that we do this, for example: > > tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups); > > but then in the next loop we call estimate_num_groups_incremental and > pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this > may have various strange consequences - we'll calculate the nGroups > based on the inflated value, and we'll calculate tuplesPerPrevGroup from > that again - that seems susceptible to "amplification". > > We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher > nGroups estimate in the next loop - but not linearly. An then we'll > calculate the inflated tuplesPerPrevGroup and estimated nGroup ... Weighting coefficient '1.5' shows our desire to minimize the number of comparison operations on each next attribute of a pathkeys list. Increasing this coef we increase a chance, that planner will order pathkeys by decreasing of uniqueness. I think, it's ok. -- regards, Andrey Lepikhov Postgres Professional
Вложения
В списке pgsql-hackers по дате отправления: