Re: Improving N-Distinct estimation by ANALYZE
От | Jim C. Nasby |
---|---|
Тема | Re: Improving N-Distinct estimation by ANALYZE |
Дата | |
Msg-id | 20060106160906.GH3902@pervasive.com обсуждение исходный текст |
Ответ на | Re: Improving N-Distinct estimation by ANALYZE (Greg Stark <gsstark@mit.edu>) |
Ответы |
Re: Improving N-Distinct estimation by ANALYZE
Re: Improving N-Distinct estimation by ANALYZE |
Список | pgsql-hackers |
On Fri, Jan 06, 2006 at 01:24:41AM -0500, Greg Stark wrote: > > 5% based on block-based sampling is reasonable; that means a straight 5% of > > the on-disk size of the table, so 5gb for a 100gb table. With random-row > > sampling, that would require as much as 25% of the table, making it easier > > to just scan the whole thing. > > Postgres's current sample sizes are clearly geared towards the histograms > where they are entirely realistic. All of the distinct estimates are clearly > just ad hoc attempts based on the existing sampling. > > Is a mechanism that is only 5x faster than reading the whole table (assuming > random_page_cost of 4) and is off by more than a factor of three 10% of the > time really worth it? Before we start debating merits of proposals based on random reads, can someone confirm that the sampling code actually does read randomly? I looked at it yesterday; there is a comment that states that blocks to be scanned are passed to the analyze function in physical order, and AFAICT the function that chooses blocks does so based strictly on applying a probability function to block numbers as it increments a counter. It seems that any reading is actually sequential and not random, which makes all the random_page_cost hand-waving null and void. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
В списке pgsql-hackers по дате отправления: