Re: [GENERAL] Problem (bug?) with like
От | Bruce Momjian |
---|---|
Тема | Re: [GENERAL] Problem (bug?) with like |
Дата | |
Msg-id | 200112281827.fBSIRRk28377@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: [GENERAL] Problem (bug?) with like (Bruce Momjian <pgman@candle.pha.pa.us>) |
Список | pgsql-hackers |
> > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > But what about '%A%' vs. '%AC%'. Seems the second is reasonably > > > different from the first the our optimizer may be fine with that. Is it > > > only when the strings get longer that we lose specificity? > > > > Yeah, I don't think that the estimates are bad for one or two > > characters. But the estimate gets real small real fast as you > > increase the number of match characters in the LIKE pattern. > > We need to slow that down some. > > Yea, maybe a log base 2 decrease: > > 1 char 1x > 2 char 2x > 4 char 3x > 8 char 4x > 16 char 5x I did a little research on this. I think the problem is that ordinary characters are assumed to randomly appear in a character string, while in practice, if the string has already been specified like 'DAV', there are very few additional characters that can follow it and make sense. Looking at backend/utils/adt/selfuncs.c, I see this: #define FIXED_CHAR_SEL 0.04 /* about 1/25 */ ... sel *= FIXED_CHAR_SEL; which means every additional character reduces the selectivity by 96%. This seems much too restrictive to me. Because of the new optimizer buckets, we do have good statistics on the leading character, but additional characters drastically reduce selectivity. I think perhaps a number like 0.50 or 50% may be correct. That would be a table like this: 1 char 2x 2 char 4x 4 char 8x 8 char 16x 16 char 32x which is more restrictive than I initially suggested above but less restrictive than we have now. Should we assume additional characters are indeed randomly appearing in the string? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
В списке pgsql-hackers по дате отправления: