Re: tsearch2 dictionary for statute cites
От | Oleg Bartunov |
---|---|
Тема | Re: tsearch2 dictionary for statute cites |
Дата | |
Msg-id | Pine.LNX.4.64.0903110952430.31919@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | Re: tsearch2 dictionary for statute cites (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: tsearch2 dictionary for statute cites
|
Список | pgsql-general |
On Tue, 10 Mar 2009, Tom Lane wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> People are likely to search for statute cites, which tend to have a >> hierarchical form. I'm not sure the prefix approach will work for >> this. For example, there is a section 939.64 in the state statutes >> dealing with commission of a crime while wearing a bulletproof >> garment. If someone searches for that, they should find subsections >> like 939.64(1) or 939.64(2) but not different sections which start >> with the same characters like 939.641 (the section on concealing >> identity) or 939.645 (the section on hate crimes). A search for >> chapter 939 should return any of the above. > > I think what you need is a custom parser that treats these similarly to > hyphenated words. If I pretend that the dot is a hyphen I get matching > behavior that seems to meet all those requirements. > > Unfortunately we don't seem to have any really easy way to plug in a > custom parser, other than copy-paste-modify the existing one which would > be a PITA from a maintenance standpoint. Perhaps you could pass the > texts and the queries through a regexp substitution that converts > digit-dot-digit to digit-dash-digit? perhaps, for 8.4 it's better to utilize prefix search, like to_tsquery('939.645:*') will find what Kevin need. The problem is with parser, so I'd preprocess text before indexing to convert all digit.digit(digit) to digit.digit.digit, which is what parser recognizes as a single lexem 'version'. Here is just an illustration qq=# select * from ts_parse('default',translate('939.64(1)','()','. ')); tokid | token -------+---------- 8 | 939.64.1 12 | btw, having 'version' it's possible to use dict_regex for 8.3. > > regards, tom lane > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-general по дате отправления: