Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
От | Oleg Bartunov |
---|---|
Тема | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? |
Дата | |
Msg-id | CAF4Au4wkjS6D2dG9Z1_VFJ95zojhwpVvkY4JGq6W-BwL3+tJyQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
|
Список | pgsql-hackers |
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes: >> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for >> matching consecutive words but it won't work for us if it cannot handle >> consecutive *duplicate* words. > >> For example, the following returns true: select >> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue'); > >> Is this expected ? > > I concur that that seems like a rather useless behavior. If we have > "x <-> y" it is not possible to match at distance zero, while if we > have "x <-> x" it seems unlikely that the user is expecting us to > treat that identically to "x". So phrase search simply should not > consider distance-zero matches. what's about word with several infinitives select to_tsvector('en', 'leavings'); to_tsvector ------------------------'leave':1 'leavings':1 (1 row) select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;?column? ----------t (1 row) > > The attached one-liner patch seems to fix this problem, though I am > uncertain whether any other places need to be changed to match. > Also, there is a regression test case that changes: > > *** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016 > --- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016 > *************** > *** 897,903 **** > SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A'); > ts_rank_cd > ------------ > ! 0.0714286 > (1 row) > > SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B'); > --- 897,903 ---- > SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A'); > ts_rank_cd > ------------ > ! 0 > (1 row) > > SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B'); > > > I'm not sure if this case is intentionally exhibiting the behavior that > both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the > result simply wasn't thought about carefully. > > regards, tom lane >
В списке pgsql-hackers по дате отправления: