Re: Problem with n_distinct being consistently inaccurate.
От | Nick Fankhauser |
---|---|
Тема | Re: Problem with n_distinct being consistently inaccurate. |
Дата | |
Msg-id | NEBBLAAHGLEEPCGOBHDGOEIOIJAA.nickf@ontko.com обсуждение исходный текст |
Ответ на | Re: Problem with n_distinct being consistently inaccurate. (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Problem with n_distinct being consistently inaccurate.
|
Список | pgsql-admin |
> AFAIK, estimating number of distinct values from a small sample is > inherently an ill-conditioned problem. If I had been getting estimates all over the map, I'd have been a bit more unconcerned, but what I'm seeing is a very consistent number that also increases and tends to be more consistent in proportion to the stats "target" number. This makes me think that there is more at work here than the inaccuracy likely to occur from small samples. It's as if the algorithm and sample size (even at default) are pretty reasonable for returning consistent results in this case, but a multiplier needs to be changed. For instance, with the various values for statistics, if I do an analyze on the table and then look at n_distinct 6 times, these are the results I get: (actual number is 92,000) set statistics = -1 (default): 13549 14268 14772 14518 13863 13526 mean = 14083 std dev = 518 or 3.7% of mean set statistics = 100 22457 22598 22566 22580 22767 22490 mean = 22576 std dev = 108 or .5% of mean set statistics = 500 39878 39984 40018 39977 39885 40070 mean = 39968 std dev = 75 or .2% of mean set statistics = 1000 51428 51503 51486 51658 51625 51589 mean = 51548 std dev = 74 or .1% of mean > You could try sticking the correct n_distinct into pg_statistic by hand > just to see if it really does change the plan OK... but I'm a bit confused on how to get to the right row in pg_statistic. when I do a \d on pg_stats, it appears that pg_statistic.starelid matches up with pg_class.oid, but apparently this is not the case. Is there a place I can look to find which keys correspond among the pg_catalog tables? > but I'd like to think > that getting within a factor of 2 is good enough. Probably so... but with the default stats, it is more like a factor of 6, which seems significant to me, and if my conjecture is correct, it might be an easy fix. (Easy for me to say, since I'm not a developer. <grin>) -Nick
В списке pgsql-admin по дате отправления: