Re: POC: GROUP BY optimization
От | Andrey Lepikhov |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | 4ee5b15c-1ce0-59c7-8d4a-a1fd68ee0d12@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>) |
Ответы |
Re: POC: GROUP BY optimization
|
Список | pgsql-hackers |
On 7/22/21 3:58 AM, Tomas Vondra wrote: > 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to > estimate_num_groups_incremental. In principle yes, if we use "group > size" from the previous step, then the returned value is the number of > new groups after adding the "new" pathkey. > But even if we ignore the issues with amplification mentioned in (3), > there's an issue with non-linear behavior in estimate_num_groups, > because at some point it's calculating > > D(N,n,p) = n * (1 - ((N-p)/N)^(N/n)) > > where N - total rows, p - sample size, n - number of distinct values. > And if we have (N1,n1) and (N2,n2) then the ratio of calculated > estimated (which is pretty much what calculating group size does) > > D(N2,n2,p2) / D(N1,n1,p1) > > which will differ depending on p1 and p2. And if we're tweaking the > tuplesPerPrevGroup all the time, that's really annoying, as it may make > the groups smaller or larger, which is unpredictable and annoying, and I > wonder if it might go against the idea of penalizing tuplesPerPrevGroup > to some extent. > We could simply use the input "tuples" value here, and then divide the > current and previous estimate to calculate the number of new groups. tuplesPerPrevGroup is only a top boundary for the estimation. I think, we should use result of previous estimation as a limit for the next during incremental estimation. Maybe we simply limit the tuplesPerPrevGroup value to ensure of monotonic non-growth of this value? - see in attachment patch to previous fixes. -- regards, Andrey Lepikhov Postgres Professional
Вложения
В списке pgsql-hackers по дате отправления: