Re: Top N queries and disbursion
От | Tom Lane |
---|---|
Тема | Re: Top N queries and disbursion |
Дата | |
Msg-id | 1078.939338218@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Top N queries and disbursion (Roberto Cornacchia <rcorna@tin.it>) |
Ответы |
Re: [HACKERS] Re: Top N queries and disbursion
|
Список | pgsql-hackers |
Roberto Cornacchia <rcorna@tin.it> writes: > ... since we are working on snapshots of the 6.6 > release (now we are using snapshot dated 9/13/99) we are afraid of > instability problems to affect the results. Could you give us any > suggestion about this? We are quite close to the degree day, so we have > to optimize time usage... If you don't want to spend time tracking development changes then you probably ought to stick with the snapshot you have. I don't see any reason that you should try to track changes right now... > We need to estimate the number of distinct values of an attribute. We > thought 1/disbursion was the right solution, but the results were quite > wrong: No, it's certainly not the right thing. To my understanding, disbursion is a measure of the frequency of the most common value of an attribute; but that tells you very little about how many other values there are. 1/disbursion is a lower bound on the number of values, but it wouldn't be a good estimate unless you had reason to think that the values were pretty evenly distributed. There could be a *lot* of very-infrequent values. > with 100 distinct values of an attribute uniformly distribuited in a > relation of 10000 tuples, disbursion was estimated as 0.002275, giving > us 440 distinct values. This is an illustration of the fact that Postgres' disbursion-estimator is pretty bad :-(. It usually underestimates the frequency of the most common value, unless the most common value is really frequent (probability > 0.2 or so). I've been trying to think of a more accurate way of figuring the statistic that wouldn't be unreasonably slow. Or, perhaps, we should forget all about disbursion and adopt some other statistic(s). regards, tom lane
В списке pgsql-hackers по дате отправления: