Re: estimating # of distinct values
От | Heikki Linnakangas |
---|---|
Тема | Re: estimating # of distinct values |
Дата | |
Msg-id | 4D37EDEE.9070906@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: estimating # of distinct values (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: estimating # of distinct values
|
Список | pgsql-hackers |
On 20.01.2011 04:36, Robert Haas wrote: > ... Even better, the > code changes would be confined to ANALYZE rather than spread out all > over the system, which has positive implications for robustness and > likelihood of commit. Keep in mind that the administrator can already override the ndistinct estimate with ALTER TABLE. If he needs to manually run a special ANALYZE command to make it scan the whole table, he might as well just use ALTER TABLE to tell the system what the real (or good enough) value is. A DBA should have a pretty good feeling of what the distribution of his data is like. And how good does the estimate need to be? For a single-column, it's usually not that critical, because if the column has only a few distinct values then we'll already estimate that pretty well, and OTOH if ndistinct is large, it doesn't usually affect the plans much if it's 10% of the number of rows or 90%. It seems that the suggested multi-column selectivity estimator would be more sensitive to ndistinct of the individual columns. Is that correct? How is it biased? If we routinely under-estimate ndistinct of individual columns, for example, does the bias accumulate or cancel itself in the multi-column estimate? I'd like to see some testing of the suggested selectivity estimator with the ndistinct estimates we have. Who knows, maybe it works fine in practice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: