Re: phrase search
От | Teodor Sigaev |
---|---|
Тема | Re: phrase search |
Дата | |
Msg-id | 488629FB.2030501@sigaev.ru обсуждение исходный текст |
Ответ на | Re: phrase search (Oleg Bartunov <oleg@sai.msu.su>) |
Список | pgsql-hackers |
>> 1. What is the meaning of such a query operator? >> >> foo #5 bar -> true if the document has word "foo" followed by "bar" at >> 5th position. >> >> foo #<5 bar -> true if document has word "foo" followed by "bar" with in >> 5 positions >> >> foo #>5 bar -> true if document has word "foo" followed by "bar" after 5 >> positions Sounds good, but, may be it's an overkill. >> etc ..... >> >> 2. How to implement such query operators? >> >> Should we modify QueryItem to include additional distance information or >> is there any other way to accomplish it? >> >> Is the following list sufficient to accomplish this? >> a. Modify to_tsquery >> b. Modify TS_execute in tsvector_op.c to check new operator Exactly >> >> Is there anything needed in rewrite subsystem? Yes, of course - rewrite system should support that operation. >> >> 3. Are these valid uses of the operators and if yes what would they >> mean? >> >> foo #5 (bar & cup) It must support! Because of lexize might return subtsquery. For example, russian ispell can return several lexemes: "adfg" can become a 'adf | adfs | ad', norwegian and german languages are more complicated: "abc" -> " (ab & c) | (a & bc) | abc" >> 4. If the operator only applies to two query items can we create an >> index such that (foo, bar)-> documents[min distance, max distance] >> How difficult it is to implement an index like this? No, index should execute query 'foo & bar' and mark recheck flag to true to execute 'foo #<5 bar' on original tsvector from table. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: