Re: Improving N-Distinct estimation by ANALYZE
От | Greg Stark |
---|---|
Тема | Re: Improving N-Distinct estimation by ANALYZE |
Дата | |
Msg-id | 87psn52ajv.fsf@stark.xeocode.com обсуждение исходный текст |
Ответ на | Re: Improving N-Distinct estimation by ANALYZE (Josh Berkus <josh@agliodbs.com>) |
Ответы |
Re: Improving N-Distinct estimation by ANALYZE
|
Список | pgsql-hackers |
Josh Berkus <josh@agliodbs.com> writes: > > These numbers don't make much sense to me. It seems like 5% is about as > > slow as reading the whole file which is even worse than I expected. I > > thought I was being a bit pessimistic to think reading 5% would be as > > slow as reading 20% of the table. > > It's about what *I* expected. Disk seeking is the bane of many access > methods. Sure, but that bad? That means realistic random_page_cost values should be something more like 20 rather than 4. And that's with seeks only going to subsequent blocks in a single file, which one would expect to average less than the half rotation that a random seek would average. That seems worse than anyone expects. > Anyway, since the proof is in the pudding, Simon and I will be working on > some demo code for different sampling methods so that we can debate > results rather than theory. Note that if these numbers are realistic then there's no i/o benefit to any sampling method that requires anything like 5% of the entire table and is still unreliable. Instead it makes more sense to implement an algorithm that requires a full table scan and can produce good results more reliably. -- greg
В списке pgsql-hackers по дате отправления: