[BUGS] Mishandling of right-associated phrase operators in FTS
От | Tom Lane |
---|---|
Тема | [BUGS] Mishandling of right-associated phrase operators in FTS |
Дата | |
Msg-id | 26706.1482087250@sss.pgh.pa.us обсуждение исходный текст |
Список | pgsql-bugs |
What do you think a tsquery like 'x <-> (y <-> z)' should mean? I find it hard to assign it any meaning other than the same thing as '(x <-> y) <-> z', ie, it should match a 3-lexeme sequence 'x y z'. Right now, the execution engine gets this wrong: regression=# select to_tsvector('x y z') @@ to_tsquery('x <-> y <-> z'); ?column? ---------- t -- okay (1 row) regression=# select to_tsvector('x y z') @@ to_tsquery('x <-> (y <-> z)'); ?column? ---------- f -- not so okay (1 row) This happens because the lower (righthand) <-> operator returns the position of its righthand-side input ('z'), but that's two away from where the 'x' is, so the upper phrase operator doesn't think there is a match. I considered trying to fix this by forcing right-associated cases into left-associated form during tsquery parsing, but that has all the same problems that I pointed out with respect to normalize_phrase_tree(). Really it'd be best to fix this by making the executor cope properly. I think what we want is to pass down a flag telling recursive invocations of TS_phrase_execute whether to return the position of the left-side or right-side argument of a phrase match, which we would set according to whether we are within the right or left argument of the most closely nested upper phrase operator. I propose to incorporate that fix into the TS_phrase_execute rewrite I'm working on. A related problem appears in clean_fakeval_intree()'s attempts to adjust phrase-operator distances when it removes a stopword. For example, 'a' is a stopword, so we get: regression=# select to_tsquery('(b <-> a) <-> c'); to_tsquery ------------- 'b' <2> 'c' (1 row) That's fine, but I don't think this answer is right: regression=# select to_tsquery('b <-> (a <-> c)'); to_tsquery ------------- 'b' <-> 'c' (1 row) It should be 'b <2> c', same as the other one. I haven't worked this out in detail, but I think a similar solution would work for clean_fakeval_intree: pass down a flag indicating if we're within the left or right argument of a <-> op, and return the appropriate adjustment distance based on that. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
В списке pgsql-bugs по дате отправления: