Re: Tsearch2 custom dictionaries
От | Oleg Bartunov |
---|---|
Тема | Re: Tsearch2 custom dictionaries |
Дата | |
Msg-id | Pine.GSO.4.56.0308071816320.17880@ra.sai.msu.su обсуждение исходный текст |
Ответ на | Tsearch2 custom dictionaries (psql-mail@freeuk.com) |
Список | pgsql-general |
On Thu, 7 Aug 2003 psql-mail@freeuk.com wrote: > Part1. > > I have created a dictionary called 'webwords' which checks all words > and curtails them to 300 chars (for now) > > after running > make > make install > > I then copied the lib_webwords.so into my $libdir > > I have run > > psql mybd < dict_webwords.sql > > The tutorial shows how to install the intdict for integer types. How > should i install my custom dictionary? Once you did 'psql mybd < dict_webwords.sql' you should be able use it :) Test it : select lexize('webwords','some_web_word'); Did you read http://www.sai.msu.su/~megera/oddmuse/index.cgi/Gendict > > > Part2. > > The dictionary I am trying to create is to be used for searching > multilingual text. My aim is to have fast search over all text, but > ignore binary encoded data which is also present. (i will probably move > to ignoring long words in the text eventually). > What is the best approach to tackle this problem? > As the text can be multilingual I don't think stemming is possible? You're right. I'm afraid you need UTF database, but tsearch2 isn't UTF-8 compatible :( > I also need to include many none-standard words in the index such as > urls and message ID's contained in the text. > What's message ID ? Integer ? it's already recognized by parser. try select * from token_type(); Also, last version of tsearch2 (for 7.3 grab from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/, for 7.4 - available from CVS) has rather useful function - ts_debug apod=# select * from ts_debug('http://www.sai.msu.su/~megera'); ts_name | tok_type | description | token | dict_name | tsvector ---------+----------+-------------+----------------+-----------+------------------ simple | host | Host | www.sai.msu.su | {simple} | 'www.sai.msu.su' simple | lword | Latin word | megera | {simple} | 'megera' (2 rows) > I get the feeling that building these indexs will by no means be an > easy task so any suggestions will be gratefully recieved! > You may write your own parser, at last. Some info about parser API: http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_in_Brief > Thanks... > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
В списке pgsql-general по дате отправления: