Re: [Fwd: Re: tsearch in core patch]
От | Tatsuo Ishii |
---|---|
Тема | Re: [Fwd: Re: tsearch in core patch] |
Дата | |
Msg-id | 20070625.134059.26277531.t-ishii@sraoss.co.jp обсуждение исходный текст |
Ответ на | Re: [Fwd: Re: tsearch in core patch] (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [Fwd: Re: tsearch in core patch]
|
Список | pgsql-hackers |
> Tatsuo Ishii <ishii@sraoss.co.jp> writes: > > Ok, probably we need to copy the English stemming rule to the one for > > Japanese. > > Pardon my ignorance here, but is the concept of stemming even relevant > to Japanese/Chinese/Korean? What little I know about ideographic > languages suggests it wouldn't work well. And surely the specific rules > in the Snowball project's English stemmer wouldn't work. Your undestanding is correct. English stemmer would not work for Japanese "non English" part. What I meant was the "chunks of English text" in Japanese. > > I think same thing (commonly used English with local > > language) can be applied to Chinese and Korean. > > Well, it's not hard at all to find chunks of English text that have > embedded bits of French, Spanish, or what-have-you, but that's not an > argument for trying to intermix the stemmers. I doubt that such simple > bits of program could tell the language difference well enough to > determine which stemming rules to apply. For Japanese, it will be fairly simple: 7bit ASCII range words must be English (Note that mostly used Japanese encodings such as EUC do not allow to mix with ISO 8859). -- Tatsuo Ishii SRA OSS, Inc. Japan
В списке pgsql-hackers по дате отправления: