Re: How does the tsearch configuration get selected?
От | Teodor Sigaev |
---|---|
Тема | Re: How does the tsearch configuration get selected? |
Дата | |
Msg-id | 4672BDBD.2070500@sigaev.ru обсуждение исходный текст |
Ответ на | Re: How does the tsearch configuration get selected? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: How does the tsearch configuration get selected?
|
Список | pgsql-hackers |
> One possibility is that the user-visible specification is just a name > (eg, "english"), but the actual filename out on the filesystem is, > say, name.encoding.stop (eg, "english.utf8.stop") where we use PG's > names for the encodings. We could just fail if there's not a file > matching the database encoding, or we could try that and then try > utf8, or some other rule. In any case I'd want it to verify and > convert encoding as necessary while reading. I have no strong objection for UTF8-encoded files (stop words or ispell or synonym or thesaurus). Just recode it after reading. But configuration for different languages might be differ, for example russian (and any cyrillic-based) configuration is differ from west-european configuration based on different character sets. So, we should have non-obvious rules for stemmers to define which exact stemmer and stop-file should be used. For russian language with utf8 encoding it should use for lword english stemmer, but for italian language - italian stemmer. Any ASCII chars can't present in russian word, but might italian word can contains only ASCII. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: