Re: Collect frequency statistics for arrays
От | Noah Misch |
---|---|
Тема | Re: Collect frequency statistics for arrays |
Дата | |
Msg-id | 20120117103322.GA19977@tornado.leadboat.com обсуждение исходный текст |
Ответ на | Re: Collect frequency statistics for arrays (Alexander Korotkov <aekorotkov@gmail.com>) |
Ответы |
Re: Collect frequency statistics for arrays
|
Список | pgsql-hackers |
On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote: > Thanks for your fixes to the patch. Them looks correct to me. I did some > fixes in the patch. The proof of some concepts is still needed. I'm going > to provide it in a few days. Your further fixes look good. Could you also answer my question about the header comment of mcelem_array_contained_selec()? /** Estimate selectivity of "column <@ const" based on most common element* statistics. Independent element occurrencewould imply a particular* distribution of distinct element counts among matching rows. Real data* usually falsifiesthat assumption. For example, in a set of 1-element* integer arrays having elements in the range [0;10], elementoccurrences are* not independent. If they were, a sufficiently-large set would include all* distinct element counts0 through 11. We correct for this using the* histogram of distinct element counts.** In the "column @> const" and"column && const" cases, we usually have* "const" with low summary frequency of elements (otherwise we have* selectivityclose to 0 or 1 correspondingly). That's why the effect of* dependence related to distinct element counts distributionis negligible* there. In the "column <@ const" case, summary frequency of elements is* high (otherwise we haveselectivity close to 0). That's why we should do* correction due to array distinct element counts distribution.*/ By "summary frequency of elements", do you mean literally P_0 + P_1 ... + P_N? If so, I can follow the above argument for "column && const" and "column <@ const", but not for "column @> const". For "column @> const", selectivity cannot exceed the smallest frequency among const elements. A number of high-frequency elements will drive up the sum of the frequencies without changing the true selectivity much at all. Thanks, nm
В списке pgsql-hackers по дате отправления: