Re: ANALYZE sampling is too good
От | Josh Berkus |
---|---|
Тема | Re: ANALYZE sampling is too good |
Дата | |
Msg-id | 52A60606.9030409@agliodbs.com обсуждение исходный текст |
Ответ на | ANALYZE sampling is too good (Greg Stark <stark@mit.edu>) |
Ответы |
Re: ANALYZE sampling is too good
Re: ANALYZE sampling is too good Re: ANALYZE sampling is too good |
Список | pgsql-hackers |
Greg, > I really don't believe the 5% thing. It's not enough for n_distinct > and it's *far* too high a value for linear properties like histograms > or nullfrac etc. Actually, it is enough for n_distinct, or more properly, 5% is as good as you can get for n_distinct unless you're going to jump to scanning 50% or more. It's also applicable for the other stats; histogram buckets constructed from a 5% sample are more likely to be accurate than those constructed from a 0.1% sample. Same with nullfrac. The degree of improved accuracy, would, of course, require some math to determine. > From a computer point of view it's too high to be > worth bothering. If we have to read 5% of the table we might as well > do a full scan anyways, it'll be marginally slower but much better > quality results. Reading 5% of a 200GB table is going to be considerably faster than reading the whole thing, if that 5% is being scanned in a way that the FS understands. Also, we can optimize this significantly by using the VM, as Robert (I think) suggested. In the advanced approaches section, there's also the idea of collecting analyze data from table pages while they're in memory anyway for other reasons. You do seem kind of hostile to the idea of full-page-sampling, going pretty far beyond the "I'd need to see the math". Why? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
В списке pgsql-hackers по дате отправления: