Re: Improving N-Distinct estimation by ANALYZE
От | Josh Berkus |
---|---|
Тема | Re: Improving N-Distinct estimation by ANALYZE |
Дата | |
Msg-id | 200601041525.55084.josh@agliodbs.com обсуждение исходный текст |
Ответ на | Re: Improving N-Distinct estimation by ANALYZE (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Improving N-Distinct estimation by ANALYZE
|
Список | pgsql-hackers |
Tom, > In general, estimating n-distinct from a sample is just plain a hard > problem, and it's probably foolish to suppose we'll ever be able to > do it robustly. What we need is to minimize the impact when we get > it wrong. Well, I think it's pretty well proven that to be accurate at all you need to be able to sample at least 5%, even if some users choose to sample less. Also I don't think anyone on this list disputes that the current algorithm is very inaccurate for large tables. Or do they? While I don't think that we can estimate N-distinct completely accurately, I do think that we can get within +/- 5x for 80-90% of all cases, instead of 40-50% of cases like now. We can't be perfectly accurate, but we can be *more* accurate. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
В списке pgsql-hackers по дате отправления: