Обсуждение: fulltext search stemming/ spelling problems
Hi!
I'm using postgres 8.4.3 and try to get stemming/ wrong word correction
working.
I already installed the myspell dictionaries using apt-get and created
postgres dictionaries like this:
Fulltext search configuration »public.english_ispell«
Parser: »pg_catalog.default«
Token | Dictionaries
-----------------+------------------------------------
asciihword | english_ispell,english_stem,simple
asciiword | english_ispell,english_stem,simple
email | simple
file | simple
float | simple
host | simple
hword | english_ispell,english_stem,simple
hword_asciipart | english_ispell,english_stem,simple
hword_numpart | simple
hword_part | english_ispell,english_stem,simple
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | english_ispell,english_stem,simple
But when I do, for example, SELECT to_tsvector('english_ispell',
'gitar') the result is only:
'gitar':1
Shouldn't the word be corrected to 'guitar'?
SELECT plainto_tsquery('english_ispell','gitar') doesn't work neither:
'gitar'
Thanks,
Corin
On Thu, 8 Apr 2010, Corin wrote:
> Hi!
>
> I'm using postgres 8.4.3 and try to get stemming/ wrong word correction
> working.
>
> I already installed the myspell dictionaries using apt-get and created
> postgres dictionaries like this:
>
> Fulltext search configuration ?public.english_ispell?
> Parser: ?pg_catalog.default?
> Token | Dictionaries
> -----------------+------------------------------------
> asciihword | english_ispell,english_stem,simple
> asciiword | english_ispell,english_stem,simple
> email | simple
> file | simple
> float | simple
> host | simple
> hword | english_ispell,english_stem,simple
> hword_asciipart | english_ispell,english_stem,simple
> hword_numpart | simple
> hword_part | english_ispell,english_stem,simple
> int | simple
> numhword | simple
> numword | simple
> sfloat | simple
> uint | simple
> url | simple
> url_path | simple
> version | simple
> word | english_ispell,english_stem,simple
>
> But when I do, for example, SELECT to_tsvector('english_ispell', 'gitar') the
> result is only:
> 'gitar':1
>
> Shouldn't the word be corrected to 'guitar'?
english_ispell dictionary is a morphology kind of dictionary ! Read docs.
Also, simple dictionary will never invoked, since english_stem dictionary
recognizes everything !
>
> SELECT plainto_tsquery('english_ispell','gitar') doesn't work neither:
> 'gitar'
Better, use ts_debug() function or ts_dict() for testing.
>
> Thanks,
> Corin
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
On 08.04.2010 20:15, Oleg Bartunov wrote:
> On Thu, 8 Apr 2010, Corin wrote:
>
> english_ispell dictionary is a morphology kind of dictionary ! Read docs.
> Also, simple dictionary will never invoked, since english_stem dictionary
> recognizes everything !
I'm not sure what you mean with 'morphology'. I sure read the docs but
couldn't find anything about 'morphology disctionaries'.
I created it myself with the following commands, after I installed the
ispell dictionaries using "apt-get":
CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = system_en_us,
AffFile = system_en_us
);
CREATE TEXT SEARCH CONFIGURATION english_ispell ( COPY =
pg_catalog.english );
ALTER TEXT SEARCH CONFIGURATION english_ispell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword,
hword_part WITH english_ispell, english_stem;
Thank's for the hint with simple dictionary. I'll remove it - but when
it's never triggered, I gues it won't solve my problem neither?
>
> Better, use ts_debug() function or ts_dict() for testing.
ts_debug shows:
SELECT ts_debug('english_ispell','gitar');
(asciiword,"Word, all
ASCII",gitar,"{english_ispell,english_stem}",english_stem,{gitar})
(1 line)
ts_dict does not seem to exist, I neither couldn't find it in the docs.
>
> Regards,
> Oleg
Thanks,
Corin
On Thu, 8 Apr 2010, Corin wrote: > On 08.04.2010 20:15, Oleg Bartunov wrote: >> On Thu, 8 Apr 2010, Corin wrote: >> >> english_ispell dictionary is a morphology kind of dictionary ! Read docs. >> Also, simple dictionary will never invoked, since english_stem dictionary >> recognizes everything ! > I'm not sure what you mean with 'morphology'. I sure read the docs but > couldn't find anything about 'morphology disctionaries'. it means, that (from http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY) 12.6.5. Ispell Dictionary The Ispell dictionary template supports morphological dictionaries, which can normalize many different linguistic forms ofa word into the same lexeme. For example, an English Ispell dictionary can match all declensions and conjugations of thesearch term bank, e.g., banking, banked, banks, banks', and bank's. you confused with the name ! > > I created it myself with the following commands, after I installed the ispell > dictionaries using "apt-get": > > CREATE TEXT SEARCH DICTIONARY english_ispell ( > TEMPLATE = ispell, > DictFile = system_en_us, > AffFile = system_en_us > ); > > CREATE TEXT SEARCH CONFIGURATION english_ispell ( COPY = pg_catalog.english > ); > ALTER TEXT SEARCH CONFIGURATION english_ispell > ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, > hword_part WITH english_ispell, english_stem; > > Thank's for the hint with simple dictionary. I'll remove it - but when it's > never triggered, I gues it won't solve my problem neither? >> >> Better, use ts_debug() function or ts_dict() for testing. > ts_debug shows: > SELECT ts_debug('english_ispell','gitar'); > (asciiword,"Word, all > ASCII",gitar,"{english_ispell,english_stem}",english_stem,{gitar}) > (1 line) > > ts_dict does not seem to exist, I neither couldn't find it in the docs. sorry, ts_lexize >> >> Regards, >> Oleg > Thanks, > Corin > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On 08.04.2010 21:27, Oleg Bartunov wrote: > it means, that (from > http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY) > > > 12.6.5. Ispell Dictionary > > The Ispell dictionary template supports morphological dictionaries, > which can normalize many different linguistic forms of a word into the > same lexeme. For example, an English Ispell dictionary can match all > declensions and conjugations of the search term bank, e.g., banking, > banked, banks, banks', and bank's. I already read this but I don't know how to solve my problems with this information. SELECT ts_lexize('english_ispell','guitar'); {guitar} (1 line) SELECT ts_lexize('english_ispell','bank'); {bank} (1 line) SELECT ts_debug('english_ispell','bank'); (asciiword,"Word, all ASCII",bank,"{english_ispell,english_stem}",english_ispell,{bank}) (1 line) SELECT plainto_tsquery('english_ispell','bank'); 'bank' (1 line) > Regards, > Oleg It would be very nice if you (or anyone else) could provide me with concrete instructions or any howto. What can I do to find the error in my setup? What output should I expect from the above comments if everything worked correctly? Thanks, Corin