Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

Поиск

Список

Период

Сортировка

От	Teodor Sigaev
Тема	Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Дата	15 июня 2016 г. 16:05:45
Msg-id	57617CD3.4040702@sigaev.ru обсуждение исходный текст
Ответ на	Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Список	pgsql-hackers

Дерево обсуждения

>> what's about word with several infinitives
>
>> select to_tsvector('en', 'leavings');
>>        to_tsvector
>> ------------------------
>>   'leave':1 'leavings':1
>> (1 row)
>
>> select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
>>   ?column?
>> ----------
>>   t
>> (1 row)

Second example is not correct:

select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'

and

select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'

which seems correct and we don't need special threating of <0>.

> This brings up something else that I am not very sold on: to wit,
> do we really want the "less than or equal" distance behavior at all?
> The documentation gives the example that
>     phraseto_tsquery('cat ate some rats')
> produces
>     ( 'cat' <-> 'ate' ) <2> 'rat'
> because "some" is a stopword.  However, that pattern will also match
> "cat ate rats", which seems surprising and unexpected to me; certainly
> it would surprise a user who did not realize that "some" is a stopword.
>
> So I think there's a reasonable case for decreeing that <N> should only
> match lexemes *exactly* N apart.  If we did that, we would no longer have
> the misbehavior that Jean-Pierre is complaining about, and we'd not need
> to argue about whether <0> needs to be treated specially.

Agree, seems that's easy to change. I thought that I saw an issue with
hyphenated word but, fortunately, I forget that hyphenated words don't share a
position:
# select to_tsvector('foo-bar');
          to_tsvector
-----------------------------
  'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
          phraseto_tsquery
-----------------------------------
  ( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
  ?column?
----------
  t


Patch is attached

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

phrase_exact_distance.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

Вложения