Re: WIP: lookbehind constraints for our regexp engine
От | Thom Brown |
---|---|
Тема | Re: WIP: lookbehind constraints for our regexp engine |
Дата | |
Msg-id | CAA-aLv5Etwt9t1yQ7AJDewfkht1_+_hyCBxbWrZ--B5AkuuU6g@mail.gmail.com обсуждение исходный текст |
Ответ на | WIP: lookbehind constraints for our regexp engine (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
<p dir="ltr">On Oct 17, 2015 12:53 AM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br/> ><br /> > I've occasionally heard complaints that our regex engine only has<br /> > lookahead constraintsnot lookbehind constraints, while Perl's for example<br /> > does have those. While I was fooling around withthe other issues in that<br /> > code, I learned enough to realize that it would not be that hard to<br /> > implementsuch things, and the attached proof-of-concept patch does so<br /> > (using Perl-compatible syntax, in case you'rewondering).<br /> ><br /> > However, I don't feel like this is done to the point of being committable,<br />> because its performance leaves something to be desired in a lot of cases.<br /> > The difficulty is that becausethe engine can only match in a forward<br /> > direction, in the worst case you have to check a lookbehind constraintby<br /> > trying to match starting from the string start to see if a match exists<br /> > ending at thecurrent-location where you're testing the constraint. That<br /> > makes it, worst case, O(N^2) to search a stringof length N. Now, you can<br /> > improve on that greatly if you can determine that possible matches don't<br />> extend all the way back to the string start. In the attached patch I use<br /> > the "cold start pointer" returnedby shortest() to decide that characters<br /> > before the coldstart point no longer have to be rechecked. Thathelps<br /> > some, but it's not good enough because there are too many cases where the<br /> > engine doesn'tmove the coldstart point forward very aggressively.<br /> ><br /> > It might be that that behavior itself canbe improved on, which would<br /> > be nice because there are also non-lookbehind-related scenarios where<br /> >smarter coldstart detection would help performance. But I have no very<br /> > good ideas about how to do it.<br/> ><br /> > Another idea I've been toying with is to add logic that tries to determine<br /> > the maximumpossible match length for a regex. If we can determine that<br /> > for the lookbehind sub-regex, then we'd knowwe have to back up at most<br /> > that far before applying the match check. This seems promising because<br /> >a lot of other engines don't even support variable-length lookbehind<br /> > patterns at all, and so we'd be assuredof good performance in all the<br /> > cases that competitors can handle at all.<br /> ><br /> > Anyway,I'm not planning to do much more work on this right now, but<br /> > I thought I'd throw it out there just to letpeople know that this seems<br /> > within reach. I'm curious to know how many people care, and how much.<p dir="ltr">+1<pdir="ltr">I'm one of those who wanted lookbehind, and it would give us complete lookaround so very much welcome.<pdir="ltr">Thom
В списке pgsql-hackers по дате отправления: