Re: serious under-estimation of n_distinct for clustered distributions
От | Peter Geoghegan |
---|---|
Тема | Re: serious under-estimation of n_distinct for clustered distributions |
Дата | |
Msg-id | CAEYLb_VDRSye1zieFcuKH2ZnCaht11O5DZX8-cWRs9CKs0M-cg@mail.gmail.com обсуждение исходный текст |
Ответ на | serious under-estimation of n_distinct for clustered distributions (Stefan Andreatta <s.andreatta@synedra.com>) |
Ответы |
Re: serious under-estimation of n_distinct for clustered
distributions
Re: serious under-estimation of n_distinct for clustered distributions |
Список | pgsql-performance |
On 29 December 2012 20:57, Stefan Andreatta <s.andreatta@synedra.com> wrote: > Now, the 2005 discussion goes into great detail on the advantages and > disadvantages of this algorithm, particularly when using small sample sizes, > and several alternatives are discussed. I do not know whether anything has > been changed after that, but I know that the very distinct problem, which I > will focus on here, still persists. It's a really hard problem to solve satisfactorily. It's a problem that has been studied in much detail. Yes, the algorithm used is still the same. See the comments within src/backend/commands/analyze.c (IBM Research Report RJ 10025 is referenced there). The general advice here is: 1) Increase default_statistics_target for the column. 2) If that doesn't help, consider using the following DDL: alter table foo alter column bar set ( n_distinct = 5.0); -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
В списке pgsql-performance по дате отправления: