>> what's about word with several infinitives
>
>> select to_tsvector('en', 'leavings');
>> to_tsvector
>> ------------------------
>> 'leave':1 'leavings':1
>> (1 row)
>
>> select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
>> ?column?
>> ----------
>> t
>> (1 row)
Second example is not correct:
select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'
and
select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'
which seems correct and we don't need special threating of <0>.
> This brings up something else that I am not very sold on: to wit,
> do we really want the "less than or equal" distance behavior at all?
> The documentation gives the example that
> phraseto_tsquery('cat ate some rats')
> produces
> ( 'cat' <-> 'ate' ) <2> 'rat'
> because "some" is a stopword. However, that pattern will also match
> "cat ate rats", which seems surprising and unexpected to me; certainly
> it would surprise a user who did not realize that "some" is a stopword.
>
> So I think there's a reasonable case for decreeing that <N> should only
> match lexemes *exactly* N apart. If we did that, we would no longer have
> the misbehavior that Jean-Pierre is complaining about, and we'd not need
> to argue about whether <0> needs to be treated specially.
Agree, seems that's easy to change. I thought that I saw an issue with
hyphenated word but, fortunately, I forget that hyphenated words don't share a
position:
# select to_tsvector('foo-bar');
to_tsvector
-----------------------------
'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
phraseto_tsquery
-----------------------------------
( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
?column?
----------
t
Patch is attached
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/