Re: Gsoc2012 idea, tablesample
От | Tom Lane |
---|---|
Тема | Re: Gsoc2012 idea, tablesample |
Дата | |
Msg-id | 3891.1336746464@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Gsoc2012 idea, tablesample ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Ответы |
Re: Gsoc2012 idea, tablesample
|
Список | pgsql-hackers |
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Florian Pflug <fgp@phlo.org> wrote: >> Maybe one can get rid of these sorts of problems by factoring in >> the expected density of the table beforehand and simply accepting >> that the results will be inaccurate if the statistics are >> outdated? > Unless I'm missing something, I think that works for percentage > selection, which is what the standard talks about, without any need > to iterate through addition samples. Good idea! We don't need to > do any second pass to pare down initial results, either. This > greatly simplifies coding while providing exactly what the standard > requires. >> I'm not totally sure whether this approach is sensible to >> non-uniformity in the tuple to line-pointer assignment, though. If you're willing to accept that the quality of the results depends on having up-to-date stats, then I'd suggest (1) use the planner's existing technology to estimate the number of rows in the table; (2) multiply by sampling factor you want to get a desired number of sample rows; (3) use ANALYZE's existing technology to acquire that many sample rows. While the ANALYZE code isn't perfect with respect to the problem of nonuniform TID density, it certainly will be a lot better than pretending that that problem doesn't exist. regards, tom lane
В списке pgsql-hackers по дате отправления: