Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
От | Andrew Gierth |
---|---|
Тема | Re: Bogus ANALYZE results for an otherwise-unique column with many nulls |
Дата | |
Msg-id | 87wpjvecwz.fsf@news-spur.riddles.org.uk обсуждение исходный текст |
Ответ на | Re: Bogus ANALYZE results for an otherwise-unique column with many nulls (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Ответы |
Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
|
Список | pgsql-hackers |
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> What I did in the patch is to scale the formerly fixed "-1.0"Tom> stadistinct estimate to discount the fraction of nullswe found. Andrew> This seems quite dubious to me. stadistinct representing onlyAndrew> the non-null values seems to me to be substantiallymore usefulAndrew> and less confusing; it should be up to consumers to takeAndrew> stanullfrac into account(in general they already do) since inAndrew> many cases we explicitly do _not_ want to count nulls. Hm. I am wrong about this, since it's the fact that consumers are taking stanullfrac into account that makes the value wrong in the first place. For example, if a million-row table has stanullfrac=0.9 and stadistinct=-1, then get_variable_numdistinct is returning 1 million, and (for example) var_eq_non_const divides 0.1 by that to give a selectivity of 1 in 10 million, which is obviously wrong. But I think the fix is still wrong, because it changes the meaning of ALTER TABLE ... ALTER col SET (n_distinct=...) in a non-useful way; it is no longer possible to nail down a useful negative n_distinct value if the null fraction of the column is variable. Would it not make more sense to do the adjustment in get_variable_numdistinct, instead? -- Andrew (irc:RhodiumToad)
В списке pgsql-hackers по дате отправления: