[PATCH] Phrase search ported to 9.6
От | Dmitry Ivanov |
---|---|
Тема | [PATCH] Phrase search ported to 9.6 |
Дата | |
Msg-id | 33828354.WrrSMviC7Y@abook обсуждение исходный текст |
Ответы |
Re: [PATCH] Phrase search ported to 9.6
Re: [PATCH] Phrase search ported to 9.6 |
Список | pgsql-hackers |
Hi Hackers, Although PostgreSQL is capable of performing some FTS (full text search) queries, there's still a room for improvement. Phrase search support could become a great addition to the existing set of features. Introduction ============ It is no secret that one can make Google search for an exact phrase (instead of an unordered lexeme set) simply by enclosing it within double quotes. This is a really nice feature which helps to save the time that would otherwise be spent on annoying result filtering. One weak spot of the current FTS implementation is that there is no way to specify the desired lexeme order (since it would not make any difference at all). In the end, the search engine will look for each lexeme individually, which means that a hypothetical end user would have to discard documents not including search phrase all by himself. This problem is solved by the patch below (should apply cleanly to 61ce1e8f1). Problem description =================== The problem comes from the lack of lexeme ordering operator. Consider the following example: select q @@ plainto_tsquery('fatal error') from unnest(array[to_tsvector('fatal error'), to_tsvector('error is not fatal')]) as q; ?column? ---------- t t (2 rows) Clearly the latter match is not the best result in case we wanted to find exactly the "fatal error" phrase. That's when the need for a lexeme ordering operator arises: select q @@ to_tsquery('fatal ? error') from unnest(array[to_tsvector('fatal error'), to_tsvector('error is not fatal')]) as q; ?column? ---------- t f (2 rows) Implementation ============== The ? (FOLLOWED BY) binary operator takes form of "?" or "?[N]" where 0 <= N < ~16K. If N is provided, the distance between left and right operands must be no greater that N. For example: select to_tsvector('postgres has taken severe damage') @@ to_tsquery('postgres ? (severe ? damage)'); ?column? ---------- f (1 row) select to_tsvector('postgres has taken severe damage') @@ to_tsquery('postgres ?[4] (severe ? damage)'); ?column? ---------- t (1 row) New function phraseto_tsquery([ regconfig, ] text) takes advantage of the "? [N]" operator in order to facilitate phrase search: select to_tsvector('postgres has taken severe damage') @@ phraseto_tsquery('severely damaged'); ?column? ---------- t (1 row) This patch was originally developed by Teodor Sigaev and Oleg Bartunov in 2009, so all credit goes to them. Any feedback is welcome. -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Вложения
В списке pgsql-hackers по дате отправления: