Re: Another tsearch bug...
От | Oleg Bartunov |
---|---|
Тема | Re: Another tsearch bug... |
Дата | |
Msg-id | Pine.GSO.4.44.0208231520120.15230-100000@ra.sai.msu.su обсуждение исходный текст |
Ответ на | Another tsearch bug... ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>) |
Список | pgsql-hackers |
On Fri, 23 Aug 2002, Christopher Kings-Lynne wrote: > Hi guys, > > Hate to keep coming up with these bugs without patches - but I really don't > have time to look into the source code atm :( > > OK, attached is an example of the problem. Notice how trademarks and > copyright symbols are being indexed along with the word. This means that if > someone searches for 'balance' in the above data set, they won't find > anything. > > I'm not sure how this would be handled. In the English language, it'd > probably be safe to say that high ascii characters would be stripped from > the index? But you'd want to leave accents and stuff in I guess. Tricky. Rather tricky. The problem is that we don't know how to get flex to works with locale. Parser recognizes latin words ([a-zA-Z]), nonLatin ([\0200-\0377]) and mixed words ([a-zA-Z\0200-\0377]). Your case (balanceR) is the mixed word. The right way is to have locale aware parser to properly recognize words. We incline to refuse a flex. > > Anyway, just bringing it to your attention... > > Chris > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
В списке pgsql-hackers по дате отправления: