Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
От | Tom Lane |
---|---|
Тема | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? |
Дата | |
Msg-id | 11252.1465422251@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? (Oleg Bartunov <obartunov@gmail.com>) |
Ответы |
Re: Should phraseto_tsquery('simple', 'blue blue') @@
to_tsvector('simple', 'blue') be true ?
|
Список | pgsql-hackers |
Oleg Bartunov <obartunov@gmail.com> writes: > On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I concur that that seems like a rather useless behavior. If we have >> "x <-> y" it is not possible to match at distance zero, while if we >> have "x <-> x" it seems unlikely that the user is expecting us to >> treat that identically to "x". So phrase search simply should not >> consider distance-zero matches. > what's about word with several infinitives > select to_tsvector('en', 'leavings'); > to_tsvector > ------------------------ > 'leave':1 'leavings':1 > (1 row) > select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery; > ?column? > ---------- > t > (1 row) Hmm. I can grant that there might be some cases where you want to see if two separate patterns match the same lexeme, but that seems like an extremely specialized use-case that you would only invoke very intentionally. It should not be built in as part of the default behavior of every phrase search, because 99% of the time this would be an unexpected and unwanted match. I'm not even convinced that the operator for this should be spelled <0> --- that seems more like a hack than a natural extension of phrase search. But if we do spell it like that, then I think it should be called out as a special case that only applies to <0>; that is, for any other value of N, the match has to be to separate lexemes. This brings up something else that I am not very sold on: to wit, do we really want the "less than or equal" distance behavior at all? The documentation gives the example thatphraseto_tsquery('cat ate some rats') produces( 'cat' <-> 'ate' ) <2> 'rat' because "some" is a stopword. However, that pattern will also match "cat ate rats", which seems surprising and unexpected to me; certainly it would surprise a user who did not realize that "some" is a stopword. So I think there's a reasonable case for decreeing that <N> should only match lexemes *exactly* N apart. If we did that, we would no longer have the misbehavior that Jean-Pierre is complaining about, and we'd not need to argue about whether <0> needs to be treated specially. Or maybe we need two operators, one for exactly-N-apart and one for at-most-N-apart. regards, tom lane
В списке pgsql-hackers по дате отправления: