Re: snowball ASCII stemmer configuration
От | Tom Lane |
---|---|
Тема | Re: snowball ASCII stemmer configuration |
Дата | |
Msg-id | 1301915.1592318237@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: snowball ASCII stemmer configuration (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: snowball ASCII stemmer configuration
Re: snowball ASCII stemmer configuration |
Список | pgsql-hackers |
I wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> Moreover, AFAIK, the following other languages do not use Latin-based >> alphabets: >> arabic arabic \ >> greek greek \ >> nepali nepali \ >> tamil tamil \ > Hmm. I think all of those entries are ones that got added by me while > absorbing post-2007 Snowball updates, and I confess that I did not think > about this point. Maybe these should be changed. After further reflection, I think these are indeed mistakes and we should change them all. The argument for the Russian/English case, AIUI, is "if we come across an all-ASCII word, it is most certainly not Russian, and the most likely Latin-based language is English". Given the world as it is, I think the same argument works for all non-Latin-alphabet languages. Obviously specific applications might have a different idea of the best fallback language, but that's why we let users make their own text search configurations. For general-purpose use, falling back to English seems reasonable. And we can be dead certain that applying a Greek stemmer to an ASCII word will do nothing useful, so the configuration choice shown above is unhelpful. regards, tom lane
В списке pgsql-hackers по дате отправления: