Re: [Fwd: Re: tsearch in core patch]
От | Mike Rylander |
---|---|
Тема | Re: [Fwd: Re: tsearch in core patch] |
Дата | |
Msg-id | b918cf3d0706250622n6b4df67avf2a9ca9c4f6e8f48@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [Fwd: Re: tsearch in core patch] (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [Fwd: Re: tsearch in core patch]
|
Список | pgsql-hackers |
On 6/25/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, it's not hard at all to find chunks of English text that have > embedded bits of French, Spanish, or what-have-you, but that's not an > argument for trying to intermix the stemmers. I doubt that such simple > bits of program could tell the language difference well enough to > determine which stemming rules to apply. > While I imagine that is probably true of many, if not most, my project in particular would greatly benefit from the ability to mix stemmers. I work with complex bibliographic data, which has language information embedded within records. This is not limited to the record level either. Individual fields within each bibliographic record can be in different langauges. Especially in countries where making software multi-lingual (such as Canada (en_CA/fr_CA)) is a requirement for use in public institutions, the ability to choose a stemmer and stop-word list at will for any particular record will actually provide the exact behavior needed. The obvious generalization from Canada would be to support any mix of languages supported by tsearch2. I can certainly understand the benefit of making the default configuration a simple locale to language map, but there are definitely uses for searching using different stemmers/stop-lists even within the same corpus/index. So, as a datapoint for the discussion, I would ask that the option of multiple languages per DB locale not be removed if it can be at all avoided. Thanks for listening (and for all the great work on getting tsearch into core! :) ... -- Mike Rylander
В списке pgsql-hackers по дате отправления: