Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
От | Josh Berkus |
---|---|
Тема | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Дата | |
Msg-id | 200504251213.18565.josh@agliodbs.com обсуждение исходный текст |
Ответ на | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? (Simon Riggs <simon@2ndquadrant.com>) |
Ответы |
Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Список | pgsql-hackers |
Simon, Tom: While it's not possible to get accurate estimates from a fixed size sample, I think it would be possible from a small but scalable sample: say, 0.1% of all data pages on large tables, up to the limit of maintenance_work_mem. Setting up these samples as a % of data pages, rather than a pure random sort, makes this more feasable; for example, a 70GB table would only need to sample about 9000 data pages (or 70MB). Of course, larger samples would lead to better accuracy, and this could be set through a revised GUC (i.e., maximum_sample_size, minimum_sample_size). I just need a little help doing the math ... please? -- --Josh Josh Berkus Aglio Database Solutions San Francisco
В списке pgsql-hackers по дате отправления: