Re: More thoughts about planner's cost estimates
От | Josh Berkus |
---|---|
Тема | Re: More thoughts about planner's cost estimates |
Дата | |
Msg-id | 200606021223.35168.josh@agliodbs.com обсуждение исходный текст |
Ответ на | Re: More thoughts about planner's cost estimates (Greg Stark <gsstark@mit.edu>) |
Ответы |
Re: More thoughts about planner's cost estimates
|
Список | pgsql-hackers |
Greg, > Using a variety of synthetic and real-world data sets, we show that > distinct sampling gives estimates for distinct values queries that > are within 0%-10%, whereas previous methods were typically 50%-250% off, > across the spectrum of data sets and queries studied. Aha. It's a question of the level of error permissable. For our estimates, being 100% off is actually OK. That's why I was looking at 5% block sampling; it stays within the range of +/- 50% n-distinct in 95% of cases. > Doing a bit of basic searching around I think the tool we're looking for > here is called a "chi-squared test for independence". Augh. I wrote a program (in Pascal) to do this back in 1988. Now I can't remember the math. For a two-column test it's relatively computation-light, though, as I recall ... but I don't remember standard chi square works with a random sample. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
В списке pgsql-hackers по дате отправления: