Re: Pathological regexp match
От | Magnus Hagander |
---|---|
Тема | Re: Pathological regexp match |
Дата | |
Msg-id | 9837222c1001310902g68b9c352i4bff1ed494dc47ac@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Pathological regexp match (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Sat, Jan 30, 2010 at 08:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Glaesemann <michael.glaesemann@myyearbook.com> writes: >> We came across a regexp that takes very much longer than expected. > > I poked into this a little bit. What seems to be happening is that the > use of non-greedy quantifiers plus backreferences turns off most of the > optimization that the regexp engine usually does, leaving the RE as a > tree of possibilities that is explored in a fairly dumb fashion. In > particular, there is a loop in crevdissect() that tries to locate a > feasible division point between two concatenated sub-patterns, and > for each try, a recursive call to crevdissect() tries to locate a > feasible division point between *its* two sub-patterns, resulting in > O(N^2) runtime. With a not very pleasant constant factor, too, because > of repeated startup/shutdown costs for the DFA matching engine. > > I found a possible patch that seems to improve matters: AFAICS the DFA > matching is independent of the checking that cdissect() and friends do, > so we can apply that check first instead of second. This brings the > runtime down from minutes to sub-second on my machine. However I don't > have a whole lot of faith either in the correctness of this change, or > that it might not make some other cases slower instead of faster. > Has anyone got a reasonably messy collection of regexps they'd like > to try this patch on? I only have the one, so I don't think I can help all that much with that. > BTW, I filed the problem upstream with the Tcl project > https://sourceforge.net/tracker/?func=detail&aid=2942697&group_id=10894&atid=110894 > but I'm not going to hold my breath waiting for useful advice from > them. Since Henry Spencer dropped off the net, there doesn't seem > to be anybody there who understands that code much better than we do. > > Also, we likely still ought to toss some CHECK_FOR_INTERRUPTS calls > in there ... Yeah. I have zero experience around that code, so if oyu have at least some, I'd much appreciate it if you (or someone who does) could look at it. Likely to cause a lot less breakage than me :D -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
В списке pgsql-hackers по дате отправления: