Re: Odd statistics behaviour in 7.2
От | Tom Lane |
---|---|
Тема | Re: Odd statistics behaviour in 7.2 |
Дата | |
Msg-id | 21967.1013968953@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Odd statistics behaviour in 7.2 ("Gordon A. Runkle" <gar@integrated-dynamics.com>) |
Список | pgsql-hackers |
Bruce Momjian <pgman@candle.pha.pa.us> writes: > It would seem that if you could determine if the number of distinct > values is _increasing_ as you scan more rows, that an increase in table > size would also cause an increase, e.g. if you have X distinct values > looking at N rows, and 2X distinct values looking at 2N rows, that > clearly would show a scale. [ thinks for awhile... ] I don't think that'll help. You could not expect an exact 2:1 increase, except in the case of a simple unique column, which isn't the problem anyway. So the above would really have to be coded as "count the number of distinct values in the sample (d1) and the number in half of the sample (d2); then if d1/d2 >= X assume the number of distinct values scales". X is a constant somewhere between 1 and 2, but where? I think you've only managed to trade one arbitrary threshold for another one. A more serious problem is that the above could easily be fooled by a distribution that contains a few very-popular values and a larger number of seldom-seen ones. Consider for example a column "number of children" over a database of families. In a sample of a thousand or so, you might well see only values 0..4 (or so); if you double the size of the sample, and find a few rows with 5 to 10 kids, are you then correct to label the column as scaling with the size of the database? regards, tom lane
В списке pgsql-hackers по дате отправления: