Re: best statistic target for boolean columns
От | Gaetano Mendola |
---|---|
Тема | Re: best statistic target for boolean columns |
Дата | |
Msg-id | 4158973E.7010408@bigfoot.com обсуждение исходный текст |
Ответ на | Re: best statistic target for boolean columns (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: best statistic target for boolean columns
|
Список | pgsql-performance |
Tom Lane wrote: > Gregory Stark <gsstark@mit.edu> writes: > >>No, actually the stats table keeps the n most common values and their >>frequency (usually in percentage). So really a target of 2 ought to be enough >>for boolean values. In fact that's all I see in pg_statistic; I'm assuming >>there's a full histogram somewhere but I don't see it. Where would it be? > > > It's not going to be there. The histogram only covers values that are > not in the most-frequent-values list, and therefore it won't exist for a > column that is completely describable by most-frequent-values. > > >>However the target also dictates how large a sample of the table to take. A >>target of two represents a very small sample. So the estimations could be >>quite far off. > > > Right. The real point of stats target for such columns is that it > determines how many rows to sample, and thereby indirectly implies > the accuracy of the statistics. For a heavily skewed boolean column > you'd want a high target so that the number of occurrences of the > infrequent value would be estimated accurately. > > It's also worth noting that the number of rows sampled is driven by the > largest per-column stats target in the table, and so reducing stats > target to 2 for a boolean column will save *zero* effort unless all the > columns in the table are booleans. Thank you all, now I have more clear how it works. Btw last time I was thinking: why during an explain analyze we can not use the information on about the really extracted rows vs the extimated rows ? Now I'm reading an article, written by the same author that ispired the magic "300" on analyze.c, about "Self-tuning Histograms". If this is implemented, I understood we can take rid of "vacuum analyze" for mantain up to date the statistics. Have someone in his plans to implement it ? After all the idea is simple: compare during normal selects the extimated rows and the actual extracted rows then use this "free" information to refine the histograms. Regards Gaetano Mendola
В списке pgsql-performance по дате отправления: