Re: POC: GROUP BY optimization

Поиск

Список

Период

Сортировка

От	Andrey V. Lepikhov
Тема	Re: POC: GROUP BY optimization
Дата	20 января 2022 г. 05:05:57
Msg-id	dc933c66-41b9-8907-327e-f6361a690654@postgrespro.ru обсуждение исходный текст
Ответ на	Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: POC: GROUP BY optimization Re: POC: GROUP BY optimization
Список	pgsql-hackers

Дерево обсуждения

I keep work on this patch. Here is intermediate results.

On 7/22/21 3:58 AM, Tomas Vondra wrote:
> in the first loop. Which seems pretty bogus - why would there be just
> two groups? When processing the first expression, it's as if there was
> one big "prev group" with all the tuples, so why not to just use nGroups
> as it is?

I think, heapsort code seems very strange. Look into fallback case. It 
based on an output_tuples value. Maybe we should use nGroups value here, 
but based on a number of output_tuples?

 > 1) I looked at the resources mentioned as sources the formulas came
 > from, but I've been unable to really match the algorithm to them. The
 > quicksort paper is particularly "dense", the notation seems to be very
 > different, and none of the theorems seem like an obvious fit. Would be
 > good to make the relationship clearer in comments etc.

Fixed (See attachment).

> 3) I'm getting a bit skeptical about the various magic coefficients that
> are meant to model higher costs with non-uniform distribution. But
> consider that we do this, for example:
> 
>     tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);
> 
> but then in the next loop we call estimate_num_groups_incremental and
> pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this
> may have various strange consequences - we'll calculate the nGroups
> based on the inflated value, and we'll calculate tuplesPerPrevGroup from
> that again - that seems susceptible to "amplification".
> 
> We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher
> nGroups estimate in the next loop - but not linearly. An then we'll
> calculate the inflated tuplesPerPrevGroup and estimated nGroup ...

Weighting coefficient '1.5' shows our desire to minimize the number of 
comparison operations on each next attribute of a pathkeys list.
Increasing this coef we increase a chance, that planner will order 
pathkeys by decreasing of uniqueness.
I think, it's ok.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Вложения

fixes.txt

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: POC: GROUP BY optimization

Вложения