Re: BUG #17761: Questionable regular expression behavior
От | Tom Lane |
---|---|
Тема | Re: BUG #17761: Questionable regular expression behavior |
Дата | |
Msg-id | 3334493.1674835493@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: BUG #17761: Questionable regular expression behavior (hubert depesz lubaczewski <depesz@depesz.com>) |
Ответы |
Re: BUG #17761: Questionable regular expression behavior
|
Список | pgsql-bugs |
hubert depesz lubaczewski <depesz@depesz.com> writes: > On Fri, Jan 27, 2023 at 09:27:35AM +0000, PG Bug reporting form wrote: >> Executing: >> select regexp_matches('a 1x1250x2500', >> '(a).*?([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?'); >> returns: {a,1,1,NULL} >> while executing: >> select regexp_matches('a 1x1250x2500', >> '(a|b).*?([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?'); >> returns: {a,1,1250,2500} >> >> Shouldn't both results be equal? > The problem is, afair, that there is some state in pg's regexp engine > that makes greedy/ungreedy decision once per regexp. Yeah. Without having traced through it, I'm fairly sure that in the first case, we have "(a)" which has no greediness, then ".*?" which is non-greedy, and then that determines the overall greediness as non-greedy, so it goes for the shortest overall match not the longest. In the second case, "(a|b)" is greedy because anything involving "|" is greedy, so we immediately decide we'll be greedy overall. The fine manual explains how you can force greediness or non-greediness when the engine's default rules for that don't do what you want. regards, tom lane
В списке pgsql-bugs по дате отправления: