Re: tsearch2 dictionary for statute cites
От | Kevin Grittner |
---|---|
Тема | Re: tsearch2 dictionary for statute cites |
Дата | |
Msg-id | 49B77DE2.EE98.0025.0@wicourts.gov обсуждение исходный текст |
Ответ на | Re: tsearch2 dictionary for statute cites (Oleg Bartunov <oleg@sai.msu.su>) |
Список | pgsql-general |
>>> Oleg Bartunov <oleg@sai.msu.su> wrote: > On Tue, 10 Mar 2009, Tom Lane wrote: >> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>> People are likely to search for statute cites, which tend to have a >>> hierarchical form. I'm not sure the prefix approach will work for >>> this. For example, there is a section 939.64 in the state statutes >>> dealing with commission of a crime while wearing a bulletproof >>> garment. If someone searches for that, they should find subsections >>> like 939.64(1) or 939.64(2) but not different sections which start >>> with the same characters like 939.641 (the section on concealing >>> identity) or 939.645 (the section on hate crimes). A search for >>> chapter 939 should return any of the above. >> >> Perhaps you could pass the texts and the queries through a regexp >> substitution that converts digit-dot-digit to digit-dash-digit? > > perhaps, for 8.4 it's better to utilize prefix search, like > to_tsquery('939.645:*') will find what Kevin need. The problem is with > parser, so I'd preprocess text before indexing to convert all > digit.digit(digit) to digit.digit.digit, which is what parser recognizes as > a single lexem 'version'. Here is just an illustration > > qq=# select * from ts_parse('default',translate('939.64(1)','()','. ')); > tokid | token > -------+---------- > 8 | 939.64.1 > 12 | > > btw, having 'version' it's possible to use dict_regex for 8.3. Tom, Oleg: Thanks for the suggestions. Looks promising. -Kevin
В списке pgsql-general по дате отправления: