Re: [HACKERS] Re: Top N queries and disbursion
От | Roberto Cornacchia |
---|---|
Тема | Re: [HACKERS] Re: Top N queries and disbursion |
Дата | |
Msg-id | 37FDEF32.DDE759AC@tin.it обсуждение исходный текст |
Ответ на | Re: [HACKERS] Re: Top N queries and disbursion (Bruce Momjian <maillist@candle.pha.pa.us>) |
Ответы |
Re: [HACKERS] Re: Top N queries and disbursion
Re: [HACKERS] Re: Top N queries and disbursion |
Список | pgsql-hackers |
Bruce Momjian wrote: > > > No, it's certainly not the right thing. To my understanding, disbursion > > is a measure of the frequency of the most common value of an attribute; > > but that tells you very little about how many other values there are. > > 1/disbursion is a lower bound on the number of values, but it wouldn't > > be a good estimate unless you had reason to think that the values were > > pretty evenly distributed. There could be a *lot* of very-infrequent > > values. > > > > > with 100 distinct values of an attribute uniformly distribuited in a > > > relation of 10000 tuples, disbursion was estimated as 0.002275, giving > > > us 440 distinct values. > > > > This is an illustration of the fact that Postgres' disbursion-estimator > > is pretty bad :-(. It usually underestimates the frequency of the most > > common value, unless the most common value is really frequent > > (probability > 0.2 or so). I've been trying to think of a more accurate > > way of figuring the statistic that wouldn't be unreasonably slow. > > Or, perhaps, we should forget all about disbursion and adopt some other > > statistic(s). > > Yes, you have the crux of the issue. I wrote it because it was the best > thing I could think of, but it is non-optimimal. Because all the > optimal solutions seemed too slow to me, I couldn't think of a better > one. Thank you, Tom and Bruce. This is not a good news for us :-(. In any case, is 1/disbursion the best estimate we can have by now, even if not optimal? Roberto Cornacchia Andrea Ghidini
В списке pgsql-hackers по дате отправления: