Indexing unknown words with Tsearch2
От | Greg Maitrallain |
---|---|
Тема | Indexing unknown words with Tsearch2 |
Дата | |
Msg-id | 49D36E3F.1080207@evodia.fr обсуждение исходный текст |
Ответы |
Re: Indexing unknown words with Tsearch2
|
Список | pgsql-general |
Hi, First of all, excuse my poor english :) I'm working on a fulltext database with tsearch2, which contains french historical writings. I'm using the fr_ispell dictionnary that can be found here : http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ (ispell-french.tar.gz <http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/dicts/ispell/ispell-french.tar.gz> - submitted by Max Jacob) The database encoding is LATIN1 The problem is the writings contains many names of personnalities. For example : Churchill (the database covers WWII). But when I try to search for these names, nothing is found. I tried many things, like this introduction : http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html And I think the problem's root is that no lexem is found (I could even say an empty lexem is found). With the default en_stem dictionnary, I get this : SELECT lexize('en_stem', 'churchill'); "{churchil}" Then, I try to add the french dictionnary : INSERT INTO pg_ts_dict (SELECT 'fr_ispell', dict_init, 'DictFile="/home/.../french.dict",' 'AffFile="/home/.../french.aff",' 'StopFile="/home/.../french.stop"', dict_lexize FROM pg_ts_dict WHERE dict_name = 'ispell_template'); And the result is : SELECT lexize('fr_ispell', 'churchill'); "" My questions are : - Is it OK to give empty string as a result for a word that is not in the dictionnary, neither in the stop words ? - Is there a way to get the word itself as a result, when the word is not in the dictionnary, neither in the stop words ? - If yes, how ? I'm also interested in any information you could give me... Many thanks ! Greg Maitrallain.
В списке pgsql-general по дате отправления: