Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)
От | Bruce Momjian |
---|---|
Тема | Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect) |
Дата | |
Msg-id | 199907281606.MAA11774@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect) (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)
|
Список | pgsql-hackers |
> Zeugswetter Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes: > > Other db's usually use the value count(*) / nunique for the light > > weight statistics. This makes the assumptoin that the distinct index > > values are evenly distributed. That is on average a correct > > assumption, whereas our assumption on average overestimates the number > > of rows returned. I am not sure we have a nunique info though. > > We don't, and AFAICS it would be an expensive statistic to compute. > > I have thought about this a little more overnight, and I have come up > with what I think is a better idea. Suppose that VACUUM ANALYZE stores > in pg_statistic not only the disbursion, but also the most frequently > occurring value of each column. It already computes (or I should say > estimates) the most frequently occurring value (MFOV) in order to arrive > at the disbursion, so storing the value costs nothing except a little > more space in pg_statistic. Now, the logic that eqsel() should use is > > if constant-being-compared-against == MFOV then > return disbursion; > else > return MIN(disbursion, 1.0 - disbursion); Yes, I like this. > BTW, this argument proves rigorously that the selectivity of a search > for any value other than the MFOV is not more than 0.5, so there is some > basis for my intuition that eqsel should not return a value above 0.5. > So, in the cases where eqsel does not know the exact value being > searched for, I'd still be inclined to cap its result at 0.5. I don't follow this. If the most frequent value occurs 95% of the time, wouldn't the selectivity be 0.95? And why use 1-disbursion. You may find the existing code does a better job than the MIN() computation. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
В списке pgsql-hackers по дате отправления: