Re: Estimating number of distinct values.
От | Tom Lane |
---|---|
Тема | Re: Estimating number of distinct values. |
Дата | |
Msg-id | 19857.1540391239@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Estimating number of distinct values. (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Список | pgsql-hackers |
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: > I will be pleased if somebody (first of all Robert) can comment me > "strange" results of distinct values estimation. Estimating the number of distinct values from a small sample is a hard problem; every algorithm is going to blow it in some cases. > In my case there are no null values: > totalrows = 48980014 > samplerows = 30000 > ndistinct = 29135 > nmultiple = 800 You seem to be using the default statistics target. Possibly raising that would give better answers for this column. > May be instead of sampling-based estimation use streaming > based estimation, for example HyperLogLog algorithm already present in > Postgres? Maybe, but I'd be really surprised if HLL were any sort of magic bullet. I think it's intended to make an ndistinct estimate with just a small amount of state preserved as rows are scanned, which is an admirable goal but not one that ANALYZE particularly needs. Adopting such a constraint when we don't need it does not sound like a recipe for getting a better final answer. regards, tom lane
В списке pgsql-hackers по дате отправления: