pg_regprefix() doesn't handle lookahead constraints correctly
От | Tom Lane |
---|---|
Тема | pg_regprefix() doesn't handle lookahead constraints correctly |
Дата | |
Msg-id | 1218.1445273095@sss.pgh.pa.us обсуждение исходный текст |
Список | pgsql-bugs |
Whilst poking around the topic of lookbehind constraints, I realized that pg_regprefix(), which is code we added to extract any fixed prefix that a regex might have, is already broken for lookahead constraints. Observe: regression=# create table pp (f1 text unique); CREATE TABLE regression=# explain select * from pp where f1 ~ '^abcde(f|(?=\d))'; QUERY PLAN ------------------------------------------------------------------------ Bitmap Heap Scan on pp (cost=4.22..14.37 rows=7 width=32) Filter: (f1 ~ '^abcde(f|(?=\d))'::text) -> Bitmap Index Scan on pp_f1_key (cost=0.00..4.22 rows=7 width=0) Index Cond: ((f1 >= 'abcdef'::text) AND (f1 < 'abcdeg'::text)) (4 rows) It thinks that "f" is part of the fixed prefix of the regex, whereas in point of fact the regex will match strings that have a digit there. The reason is that pg_regprefix thinks it can ignore LACON arcs altogether, whereas really what it had better do is treat them as a reason to give up searching. In principle we could traverse across the LACON arc (which consumes no input) and look to see if the following state has a single outarc color. But that would greatly complicate the code and I doubt it corresponds to any case that's of real-world interest. That is, it'd allow us to recognize that, say, '^abcde(?=\w)f' has a fixed prefix of 'abcdef' not just 'abcde', but who writes regexes like that? Most likely, a lookahead constraint is at the end of the regex anyway. (Similarly, it won't be very interesting to optimize for lookbehind constraints here, because nobody would write such a thing in a fixed-prefix regex.) This is such a corner case that it's not too surprising we've had no field reports about it. But still, a bug is a bug. regards, tom lane
В списке pgsql-bugs по дате отправления: