Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Дата
Msg-id 16167.1465337110@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?  (Jean-Pierre Pelletier <jppelletier@e-djuster.com>)
Ответы Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?  (Oleg Bartunov <obartunov@gmail.com>)
Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
> matching consecutive words but it won't work for us if it cannot handle
> consecutive *duplicate* words.

> For example, the following returns true:    select
> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');

> Is this expected ?

I concur that that seems like a rather useless behavior.  If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x".  So phrase search simply should not
consider distance-zero matches.

The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:

*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out  Thu May  5 19:21:17 2016
--- /home/postgres/pgsql/src/test/regress/results/tstypes.out   Tue Jun  7 17:55:41 2016
***************
*** 897,903 ****
  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
   ts_rank_cd
  ------------
!   0.0714286
  (1 row)

  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
--- 897,903 ----
  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
   ts_rank_cd
  ------------
!           0
  (1 row)

  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');


I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.

            regards, tom lane

diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 591e59c..95ad69b 100644
*** a/src/backend/utils/adt/tsvector_op.c
--- b/src/backend/utils/adt/tsvector_op.c
*************** TS_phrase_execute(QueryItem *curitem,
*** 1409,1415 ****
          {
              while (Lpos < Ldata.pos + Ldata.npos)
              {
!                 if (WEP_GETPOS(*Lpos) <= WEP_GETPOS(*Rpos))
                  {
                      /*
                       * Lpos is behind the Rpos, so we have to check the
--- 1409,1415 ----
          {
              while (Lpos < Ldata.pos + Ldata.npos)
              {
!                 if (WEP_GETPOS(*Lpos) < WEP_GETPOS(*Rpos))
                  {
                      /*
                       * Lpos is behind the Rpos, so we have to check the

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Parallel query and temp_file_limit
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: COMMENT ON, psql and access methods