Re: Tsearch2 custom dictionaries
От | psql-mail@freeuk.com |
---|---|
Тема | Re: Tsearch2 custom dictionaries |
Дата | |
Msg-id | E19kmT4-000OdB-00@buckaroo.freeuk.net обсуждение исходный текст |
Ответ на | Tsearch2 custom dictionaries (psql-mail@freeuk.com) |
Ответы |
Re: Tsearch2 custom dictionaries
|
Список | pgsql-general |
> On Thu, 7 Aug 2003 psql-mail@freeuk.com wrote: > > > Part1. > > > > I have created a dictionary called 'webwords' which checks all words > > and curtails them to 300 chars (for now) > > > > after running > > make > > make install > > > > I then copied the lib_webwords.so into my $libdir > > > > I have run > > > > psql mybd < dict_webwords.sql > > > Once you did 'psql mybd < dict_webwords.sql' you should be able use it :) > Test it : > select lexize('webwords','some_web_word'); I did test it with select lexize('webwords','some_web_word'); lexize ------- {some_web_word} select lexize('webwords','some_400char_web_word'); lexize -------- {some_shortened_web_word} so that bit works, but then I tried SELECT to_tsvector( 'webwords', 'my words' ); Error: No tsearch config > Did you read http://www.sai.msu.su/~megera/oddmuse/index.cgi/Gendict yeah, i did read it - its good! should i run: update pg_ts_cfgmap set dict_name='{webwords}'; > > Part2. <snip> > > As the text can be multilingual I don't think stemming is possible? > > You're right. I'm afraid you need UTF database, but tsearch2 isn't > UTF-8 compatible :( My database was created as unicode - does this mean I cannot use tsaerch?! > > I also need to include many none-standard words in the index such as > > urls and message ID's contained in the text. > > > > What's message ID ? Integer ? it's already recognized by parser. > > try > select * from token_type(); > > Also, last version of tsearch2 (for 7.3 grab from > http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/, > for 7.4 - available from CVS) > has rather useful function - ts_debug > > apod=# select * from ts_debug('http://www.sai.msu.su/~megera'); > ts_name | tok_type | description | token | dict_name | tsvector > ---------+----------+-------------+----------------+-----------+------ ------------ > simple | host | Host | www.sai.msu.su | {simple} | 'www. sai.msu.su' > simple | lword | Latin word | megera | {simple} | ' megera' > (2 rows) > > > > > I get the feeling that building these indexs will by no means be an > > easy task so any suggestions will be gratefully recieved! > > > > You may write your own parser, at last. Some info about parser API: > http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_in_Brief Parser writing...scary stuff :-) Thanks! --
В списке pgsql-general по дате отправления: