Re: 9.5.3: substring: regex greedy operator not picking up chars as expected
От | David G. Johnston |
---|---|
Тема | Re: 9.5.3: substring: regex greedy operator not picking up chars as expected |
Дата | |
Msg-id | CAKFQuwaAt6wYJQjKM9i-jm7hmfbi0ptiEt4SN8_vGQ43V+z-5Q@mail.gmail.com обсуждение исходный текст |
Ответ на | 9.5.3: substring: regex greedy operator not picking up chars as expected ("Foster, Russell" <Russell.Foster@crl.com>) |
Ответы |
Re: 9.5.3: substring: regex greedy operator not picking up
chars as expected
|
Список | pgsql-bugs |
=E2=80=8BWorking as documented.=E2=80=8B https://www.postgresql.org/docs/9.5/static/functions-matching.html#POSIX-MA= TCHING-RULES Specifically, this implementation considers greediness at a level higher than just the atom/expression - and in a mixed "branch" if there is a non-greedy quantifier in a branch the entire branch is non-greedy and can in many situations cause greedy atoms to behave non-greedily. In might help to consider that there aren't really any explicit "greedy" operators like other engines have (i.e., ??, ?, ?+) but rather non-greedy (lazy) and default. The default inherits the non-greedy trait from its parent if applicable otherwise is behaves greedily. On Mon, Aug 15, 2016 at 7:53 AM, Foster, Russell <Russell.Foster@crl.com> wrote: > Hello, > > > > For the following query: > > > > select substring('>772' from '.*?[0-9]+') > =E2=80=8BThe pattern itself is non-greedy=E2=80=8B due to their only being = a single branch and it having a non-greedy quantifier within it. .*? matches ">" and [0-9]+ only needs a single character to generate a non-greedy match conforming match > > I would expect the output to be =E2=80=98>772=E2=80=99, but it is =E2=80= =98>7=E2=80=99. You can also see > the expected result on https://regex101.com/, although I am aware not all > regex processors work the same. > > > > The following queries: > > > > select substring('>772' from '^.*?[0-9]+$') > =E2=80=8BThis is treated exactly the same as the above but because of the ^= $ the shortest possible output string is the entire string=E2=80=8B > > and: > > > > select substring('>772' from '[0-9]+') > > > > both return =E2=80=98>772=E2=80=99, which is expected. Could the less gr= eedy operator on > the left (.*?) be affecting the more greedy right one (+)? > > > Typo here? I'm not fluent with substring(regex). Anyway, the entire RE (single branch) is now greedy so the greedy [0-9]+ atom matches as many numbers as possible. David J.
В списке pgsql-bugs по дате отправления: