Re: Analyze and default_statistics_target
От | Heikki Linnakangas |
---|---|
Тема | Re: Analyze and default_statistics_target |
Дата | |
Msg-id | 50FD62FD.2080802@vmware.com обсуждение исходный текст |
Ответ на | Analyze and default_statistics_target (AJ Weber <aweber@comcast.net>) |
Список | pgsql-performance |
On 21.01.2013 17:29, AJ Weber wrote: > I was under the impression that the default_statistics_target was a > percentage of rows to analyze. Maybe this is not the case? Nope. > I ran an analyze during a "quiet point" last night and for a few of my > large tables, I didn't get what I consider a reasonable sampling of > rows. When running with "verbose" enabled, it appeared that a maximum of > 240000 rows were being analyzed, including on tables exceeding 4-8mm > rows. My default_statistics_target = 80. > > Shouldn't I be analyzing a larger percentage of these big tables? Analyze only needs a fairly small random sample of the rows in the table to get a picture of what the data looks like. Compare with e.g opinion polls; you only need to sample a few thousand people to get a result with reasonable error bound. That's for estimating the histogram. Estimating ndistinct is a different story, and it's well-known that the estimates of ndistinct are sometimes wildly wrong. > What is the unit-of-measure used for default_statistics_target? It's the number of entries stored in the histogram and most-common-values list in pg_statistics. See also http://www.postgresql.org/docs/devel/static/planner-stats.html: "The amount of information stored in pg_statistic by ANALYZE, in particular the maximum number of entries in the most_common_vals and histogram_bounds arrays for each column, can be set on a column-by-column basis using the ALTER TABLE SET STATISTICS command, or globally by setting the default_statistics_target configuration variable. The default limit is presently 100 entries." - Heikki
В списке pgsql-performance по дате отправления: