Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
От | Tom Lane |
---|---|
Тема | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? |
Дата | |
Msg-id | 16167.1465337110@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? (Jean-Pierre Pelletier <jppelletier@e-djuster.com>) |
Ответы |
Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? (Oleg Bartunov <obartunov@gmail.com>) Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? (Noah Misch <noah@leadboat.com>) |
Список | pgsql-hackers |
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes: > I wanted to test if phraseto_tsquery(), new with 9.6 could be used for > matching consecutive words but it won't work for us if it cannot handle > consecutive *duplicate* words. > For example, the following returns true: select > phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue'); > Is this expected ? I concur that that seems like a rather useless behavior. If we have "x <-> y" it is not possible to match at distance zero, while if we have "x <-> x" it seems unlikely that the user is expecting us to treat that identically to "x". So phrase search simply should not consider distance-zero matches. The attached one-liner patch seems to fix this problem, though I am uncertain whether any other places need to be changed to match. Also, there is a regression test case that changes: *** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016 --- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016 *************** *** 897,903 **** SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A'); ts_rank_cd ------------ ! 0.0714286 (1 row) SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B'); --- 897,903 ---- SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A'); ts_rank_cd ------------ ! 0 (1 row) SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B'); I'm not sure if this case is intentionally exhibiting the behavior that both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the result simply wasn't thought about carefully. regards, tom lane diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index 591e59c..95ad69b 100644 *** a/src/backend/utils/adt/tsvector_op.c --- b/src/backend/utils/adt/tsvector_op.c *************** TS_phrase_execute(QueryItem *curitem, *** 1409,1415 **** { while (Lpos < Ldata.pos + Ldata.npos) { ! if (WEP_GETPOS(*Lpos) <= WEP_GETPOS(*Rpos)) { /* * Lpos is behind the Rpos, so we have to check the --- 1409,1415 ---- { while (Lpos < Ldata.pos + Ldata.npos) { ! if (WEP_GETPOS(*Lpos) < WEP_GETPOS(*Rpos)) { /* * Lpos is behind the Rpos, so we have to check the
В списке pgsql-hackers по дате отправления: