Обсуждение: BUG #16826: Regex in substring(... from ..) wrong

Поиск
Список
Период
Сортировка

BUG #16826: Regex in substring(... from ..) wrong

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16826
Logged by:          James Inform
Email address:      james.inform@pharmapp.de
PostgreSQL version: 13.1
Operating system:   Mac and Ubuntu
Description:

Hopefully I am not messing up regex syntax, but it seems that handling of
non-greedy is not correct in substring function's regex.

When I set the non-greedy operator ? inside a regex in the substring
function, everything after the ? seems to be also treated as non-greedy,
which is wrong.

Please look at the following examples, the last one shows the issue:

select substring('part1.part2.part3' from '^.*');
-- Result: part1.part2.part3
-- Correct, gets the whole string

select substring('part1.part2.part3' from '^.*\.');
-- Result: part1.part2.
-- Correct, because default mode is greedy, so everything until the second
dot is catched

select substring('part1.part2.part3' from '^.*?\.');
-- Result: part1.
-- Correct, because mode is non-greedy, so everything until the first dot is
catched

select substring('part1.part2.part3' from '^.*\..*');
-- Result: part1.part2.part3
-- Correct, everything is catched 

select substring('part1.part2.part3' from '^.*?\..*');
-- Result: part1.
-- Wrong, should catch everything but seems to stay non-greedy after the ?

I have also tested against REL_13_STABLE including commit
49c928c0c067a8ec0882eeea5c03ccbd1b1b1a62, but the issue is the same.


Re: BUG #16826: Regex in substring(... from ..) wrong

От
"David G. Johnston"
Дата:
On Friday, January 15, 2021, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      16826
Logged by:          James Inform
Email address:      james.inform@pharmapp.de
PostgreSQL version: 13.1
Operating system:   Mac and Ubuntu
Description:       

Hopefully I am not messing up regex syntax, but it seems that handling of
non-greedy is not correct in substring function's regex.

When I set the non-greedy operator ? inside a regex in the substring
function, everything after the ? seems to be also treated as non-greedy,
which is wrong.

This seems to behave per the documentation in 9.7.3.5


Implementations of regex do differ so referencing prior experience for correctness has limits.

David J.