Re: ANALYZE sampling is too good
От | Claudio Freire |
---|---|
Тема | Re: ANALYZE sampling is too good |
Дата | |
Msg-id | CAGTBQpbFhAMnJcA8qOj1-0AgjVM6+L2d1nUjZRyYqMv5Sjrjtw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: ANALYZE sampling is too good (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: ANALYZE sampling is too good
|
Список | pgsql-hackers |
On Thu, Dec 12, 2013 at 3:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> It would be relatively easy to fix this if we trusted the number of visible >> rows in each block to be fairly constant. But without that assumption, I >> don't see a way to fix the sample selection process without reading the >> entire table. > > Yeah, varying tuple density is the weak spot in every algorithm we've > looked at. The current code is better than what was there before, but as > you say, not perfect. You might be entertained to look at the threads > referenced by the patch that created the current sampling method: > http://www.postgresql.org/message-id/1tkva0h547jhomsasujt2qs7gcgg0gtvrp@email.aon.at > > particularly > http://www.postgresql.org/message-id/flat/ri5u70du80gnnt326k2hhuei5nlnimonbs@email.aon.at#ri5u70du80gnnt326k2hhuei5nlnimonbs@email.aon.at > > > However ... where this thread started was not about trying to reduce > the remaining statistical imperfections in our existing sampling method. > It was about whether we could reduce the number of pages read for an > acceptable cost in increased statistical imperfection. Well, why not take a supersample containing all visible tuples from N selected blocks, and do bootstrapping over it, with subsamples of M independent rows each?
В списке pgsql-hackers по дате отправления: