Re: tsearch2 and hyphenated terms
От | Oleg Bartunov |
---|---|
Тема | Re: tsearch2 and hyphenated terms |
Дата | |
Msg-id | Pine.LNX.4.64.0804112206030.21547@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | tsearch2 and hyphenated terms (Reece Hart <reece@harts.net>) |
Ответы |
Re: tsearch2 and hyphenated terms
|
Список | pgsql-general |
We have the same problem with names in astronomy, so we implemented dict_regex http://vo.astronet.ru/arxiv/dict_regex.html Check it out ! Oleg On Thu, 10 Apr 2008, Reece Hart wrote: > I'd like to use tsearch2 to index protein and gene names. Unfortunately, > such names are written inconsistently and sometimes with hyphens. For > example, MCL-1 and MCL1 are semantically equivalent but with the default > parser and to_tsvector, I see this: > > unison@u8.3=> select to_tsvector('MCL1 MCL-1'); > to_tsvector > ------------------------- > '-1':3 'mcl':2 'mcl1':1 > > For the purposes of indexing these names, I suspect I'd get the majority > of cases by removing a hyphen when it's followed by 1 or 2 chars from > [a-zA-Z0-9]. Does that require a custom parser? > > Thanks, > Reece > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-general по дате отправления: