Re: Subquery flattening causing sequential scan
От | Ondrej Ivanič |
---|---|
Тема | Re: Subquery flattening causing sequential scan |
Дата | |
Msg-id | CAM6mieL3XY25gGQacD7EYnWg9z-P2=kAEN_15xAQvic=LQTa7w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Subquery flattening causing sequential scan (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Subquery flattening causing sequential scan
|
Список | pgsql-performance |
Hi, On 28 December 2011 05:12, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Possibly raising the stats target on emsg_messages would help. In the function std_typanalyze() is this comment: /*-------------------- * The following choice of minrows is based on the paper * "Random sampling for histogram construction: how much is enough?" * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in * Proceedings of ACM SIGMOD International Conference on Management * of Data, 1998, Pages 436-447. Their Corollary 1 to Theorem 5 * says that for table size n, histogram size k, maximum relative * error in bin size f, and error probability gamma, the minimum * random sample size is * r = 4 * k * ln(2*n/gamma) / f^2 * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain * r = 305.82 * k * Note that because of the log function, the dependence on n is * quite weak; even at n = 10^12, a 300*k sample gives <= 0.66 * bin size error with probability 0.99. So there's no real need to * scale for n, which is a good thing because we don't necessarily * know it at this point. *-------------------- */ The question is why the parameter f is not exposed as a GUC? Sometimes it could make sense to have few bins with better estimation (for same r). -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
В списке pgsql-performance по дате отправления: