Re: full-text search question
От | Oleg Bartunov |
---|---|
Тема | Re: full-text search question |
Дата | |
Msg-id | Pine.LNX.4.64.0806181715270.11363@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | full-text search question (Sabbiolina <sabbiolina@gmail.com>) |
Список | pgsql-admin |
Sabbiolina, you have two options: 1. Write you very own parser 2. Write dictionary, which breaks host to parts Fortunately, you can use our dict_regex dictionary (http://vo.astronet.ru/arxiv/dict_regex.html) instead of 2. Oleg On Wed, 18 Jun 2008, Sabbiolina wrote: > Hello, > > > > I've seen that the default parser for the full-text search can identify > e-mail addresses, hosts, URLs? but I have a serious problem with it: > > > > Suppose I index the following sentence "the search engine I use the most is > www.google.com" > > > > And I search "google" no result is found. > > Instead if I search "www.google.com" the record is found correctly. > > > > I guess the reason is because the parser treats www.google.com as a single > token (of type 'host') but as everyone can easily see the result of this is > a major problem. In fact the word "google" actually is in the above > sentence, and the end-user of the database obviously asks me "why does your > FTS not find that record when I can clearly see that my search term is > there?" > > > > Reading the docs I've seen that the parser can produce multiple tokens for > the same word (for example the word "make-up" produces 4 tokens: make-up, > make, -, up)? why not doing the same with URLs and e-mails? Why > www.google.com is only treated as a unique word? Why not producing multiple > tokens like www.google.com, www, ., google, ., com? (obviously www and . can > be nulled or stopworded). > > > Does anybody know of a better parser for Postgres? Or at least a trick to > make its FTS find the record above by searching only a part of the URL? > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-admin по дате отправления: