Re: [HACKERS] Broken select on regular expression !!!
От | Tatsuo Ishii |
---|---|
Тема | Re: [HACKERS] Broken select on regular expression !!! |
Дата | |
Msg-id | 199905210157.KAA10818@srapc451.sra.co.jp обсуждение исходный текст |
Ответ на | Re: [HACKERS] Broken select on regular expression !!! (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] Broken select on regular expression !!!
Re: [HACKERS] Broken select on regular expression !!! Re: [HACKERS] Broken select on regular expression !!! |
Список | pgsql-hackers |
>> select * from regdemo where fld1 ~* '^41|^des'; >> fld1 >> ---- >> (0 rows) > >> ^^^^^^^^^^^^^^ >> !?!?!?! > >I see it too. Even more interesting is that these variants are OK: > >regression=> select * from regdemo where fld1 ~* '^des|^41'; >fld1 >----------- >410 >destination >(2 rows) > >regression=> select * from regdemo where fld1 ~* '(^41)|(^des)'; >fld1 >----------- >410 >destination >(2 rows) > >And if you want *really* disturbing: > >regression=> select * from regdemo where fld1 ~* '^sou|^des'; >fld1 >----------- >source >destination >(2 rows) > >regression=> select * from regdemo where fld1 ~ '^sou|^des'; >fld1 >---- >(0 rows) > >Something is rotten in the state of Denmark... These all oddness are caused by the parser (makeIndexable). When makeIndexable sees ~* '^41|^des' , it tries to rewrite the target regexp so that an index can be used. The rewritten query might be something like: fld1 ~* '^41|^des' and fld1 >= '41|^' and fld1 <= '41|^\377' Apparently this is wrong. This is because makeIndexable does not understand '|' and '^' appearing in the middle of the regexp. On the other hand, >regression=> select * from regdemo where fld1 ~* '^des|^41'; >regression=> select * from regdemo where fld1 ~* '^sou|^des'; will work since makeIndexable gave up the optimization if the op is "~*" and a letter appearing right after '^' is *alphabet*. Note that: >regression=> select * from regdemo where fld1 ~ '^sou|^des'; will not work because the op is *not* "~*". It seems that the only solution is checking '|' to see if it appears in the target regexp and giving up the optimization in that case. One might think that ~* '^41|^des' can be rewritten like: fld1 ~* '^41' or fld1 ~* '^des' For me this seems not to be a good idea. To accomplish this, we have to deeply parse the regexp (consider that we might have arbitrary complex regexps) and such kind thing is a job regexp() shoud do. Comments? --- Tatsuo Ishii
В списке pgsql-hackers по дате отправления: