Re: tsearch2: enable non ascii stop words with C locale
От | Teodor Sigaev |
---|---|
Тема | Re: tsearch2: enable non ascii stop words with C locale |
Дата | |
Msg-id | 45D07FCF.7020407@sigaev.ru обсуждение исходный текст |
Ответ на | tsearch2: enable non ascii stop words with C locale (Tatsuo Ishii <ishii@postgresql.org>) |
Ответы |
Re: tsearch2: enable non ascii stop words with C locale
|
Список | pgsql-hackers |
> Currently tsearch2 does not accept non ascii stop words if locale is > C. Included patches should fix the problem. Patches against PostgreSQL > 8.2.3. I'm not sure about correctness of patch's description. First, p_islatin() function is used only in words/lexemes parser, not stop-word code. Second, p_islatin() function is used for catching lexemes like URL or HTML entities, so, it's important to define real latin characters. And it works right: it calls p_isalpha (already patched for your case), then it calls p_isascii which should be correct for any encodings with C-locale. Third (and last): contrib_regression=# show server_encoding; server_encoding ----------------- UTF8 contrib_regression=# show lc_ctype; lc_ctype ---------- C contrib_regression=# select lexize('ru_stem_utf8', RUSSIAN_STOP_WORD); lexize -------- {} Russian characters with UTF8 take two bytes. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: