Re: tsearch2: enable non ascii stop words with C locale
От | Teodor Sigaev |
---|---|
Тема | Re: tsearch2: enable non ascii stop words with C locale |
Дата | |
Msg-id | 45D17308.1070305@sigaev.ru обсуждение исходный текст |
Ответ на | Re: tsearch2: enable non ascii stop words with C locale (Tatsuo Ishii <ishii@sraoss.co.jp>) |
Ответы |
Re: tsearch2: enable non ascii stop words with C locale
|
Список | pgsql-hackers |
> I know. My guess is the parser does not read the stop word file at > least with default configuration. Parser should not read stopword file: its deal for dictionaries. > > So if a character is not ASCII, it returns 0 even if p_isalpha returns > 1. Is this what you expect? No, p_islatin should return true only for latin characters, not for national ones. > > In our case, we added JAPANESE_STOP_WORD into english.stop then: > select to_tsvector(JAPANESE_STOP_WORD) > which returns words even they are in JAPANESE_STOP_WORD. > And with the patches the problem was solved. Pls, show your configuration for lexemes/dictionaries. I suspect that you have en_stem dictionary on for lword lexemes type. Better way is to use 'simple' distionary (it's support stopword the same way as en_stem does) and set it for nlword, word, part_hword, nlpart_hword, hword, nlhword lexeme's types. Note, leave unchanged en_stem for any latin word. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: