Re: BUG #17761: Questionable regular expression behavior
От | hubert depesz lubaczewski |
---|---|
Тема | Re: BUG #17761: Questionable regular expression behavior |
Дата | |
Msg-id | Y9PGupnpVoN/uQ2w@depesz.com обсуждение исходный текст |
Ответ на | BUG #17761: Questionable regular expression behavior (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #17761: Questionable regular expression behavior
|
Список | pgsql-bugs |
On Fri, Jan 27, 2023 at 09:27:35AM +0000, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 17761 > Logged by: Konstantin Geordzhev > Email address: kosiodg@yahoo.com > PostgreSQL version: 11.10 > Operating system: tested online > Description: > > Executing: > select regexp_matches('a 1x1250x2500', > '(a).*?([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?'); > returns: {a,1,1,NULL} > while executing: > select regexp_matches('a 1x1250x2500', > '(a|b).*?([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?'); > returns: {a,1,1250,2500} > > Shouldn't both results be equal? The problem is, afair, that there is some state in pg's regexp engine that makes greedy/ungreedy decision once per regexp. I don't recall details, but my take from back when I learned about it (years ago) is to try to avoid things like .*? Instead you can: #v+ $ select regexp_matches('a 1x1250x2500', '(a)\D*([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?'); regexp_matches ───────────────── {a,1,1250,2500} (1 row) #v- depesz
В списке pgsql-bugs по дате отправления: