tsquery @> operator bugs
От | Heikki Linnakangas |
---|---|
Тема | tsquery @> operator bugs |
Дата | |
Msg-id | 544C03E8.4020402@vmware.com обсуждение исходный текст |
Список | pgsql-bugs |
While looking at all the places where we currently use CRC, I bumped into this: postgres=# select 'penomaha'::tsquery @> 'lbgimpca'::tsquery; ?column? ---------- t (1 row) The @> operator is supposed to return true if the first query contains all the terms of the second query. The above result is bogus; the strings are completely different. It returns true because both terms have the same CRC (with our funky CRC algorithm), and the tsq_mcontains function only compares the CRCs, not the actual values. Another bug is that the function performs a length check first, and returns false if the second string is larger than the first. The thinking goes that the first string cannot possibly contain the second string if the second string is larger. But that doesn't take into account that there can be duplicate strings (this is basically the same bug that was recently fixed in jsonb): postgres=# select 'a & b' @> 'a & a'::tsquery; /* CORRECT */ ?column? ---------- t (1 row) postgres-# select 'a' @> 'a & a'::tsquery; /* WRONG */ ?column? ---------- f (1 row) I propose the attached fix. It completely rewrites the tsq_mcontains function, so that it first extracts all the strings from both tsqueries, then sorts them and removes duplicates, and then compares the arrays. (I actually find the whole operator pretty useless. What is it good for? But that's a different story..) - Heikki
Вложения
В списке pgsql-bugs по дате отправления: