Re: Strange heuristic in analyze.c
От | Bruce Momjian |
---|---|
Тема | Re: Strange heuristic in analyze.c |
Дата | |
Msg-id | 201002052053.o15KrvG09347@momjian.us обсуждение исходный текст |
Ответ на | Strange heuristic in analyze.c (Greg Stark <stark@mit.edu>) |
Ответы |
Re: Strange heuristic in analyze.c
|
Список | pgsql-hackers |
Greg Stark wrote: > So I never realized the consequences of this little heuristic in > analyze.c in the handling of very low cardinality columns where we > want to just capture the complete list of values in the mcv and throw > away the histogram: > > else if (toowide_cnt == 0 && nmultiple == ndistinct) > { > /* > * Every value in the sample appeared more than once. Assume the > * column has just these values. > */ > stats->stadistinct = ndistinct; > } > > The problem with this heuristic is that if the table is small enough > you might expect you can set the statistics target high and "sample" > the entire table and get a very accurate mcv covering all the values. > However if any of the values in the table appears only once this > heuristic will defeat you. The following code will then throw out of > the mcv any value which isn't 25% more common than "average". Leaving > you with a histogram for those values which often does very poorly if > the values don't fit any pattern and are just discrete arbitrary > values. Do you want a C comment to document this problem? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
В списке pgsql-hackers по дате отправления: