Обсуждение: regexp_replace not respecting greediness

Поиск
Список
Период
Сортировка

regexp_replace not respecting greediness

От
Simon Ellmann
Дата:
With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:

postgres=# SELECT REGEXP_REPLACE('jane.smith@example.com', '.*?@.*', 'ab');
 regexp_replace 
----------------
 abexample.com
(1 row)


Affected version: PostgreSQL 17.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 15.2.1 20250813, 64-bit

Cheers,
Simon

--

Research associate
Chair for database systems
Department of Informatics
TU München  Tel: +49 89 289 17276
Boltzmannstr. 3  E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany

Re: regexp_replace not respecting greediness

От
"David G. Johnston"
Дата:
On Friday, September 19, 2025, Simon Ellmann <simon.ellmann@tum.de> wrote:
With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:

Working as documented in rule 6:


David J.

Re: regexp_replace not respecting greediness

От
Tom Lane
Дата:
Simon Ellmann <simon.ellmann@tum.de> writes:
> With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should
matchgreedily: 
> postgres=# SELECT REGEXP_REPLACE('jane.smith@example.com', '.*?@.*', 'ab');

This is correct according to the rules given at

https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES

specifically that "A branch — that is, an RE that has no top-level |
operator — has the same greediness as the first quantified atom in it
that has a greediness attribute."  Because of that, the RE as a whole
is non-greedy and will match the shortest not longest amount of text
overall.  The discussion in that manual section shows what to do
when you don't like the results.

> Other database systems (e.g., DuckDB, Umbra) match the whole input:

If your complaint is "but it's not like Perl!", I suggest using
a plperl function to do your regexp work.

            regards, tom lane



Re: regexp_replace not respecting greediness

От
Simon Ellmann
Дата:
Thanks for the hint to the documentation, I missed that part. This is really surprising behavior!

Cheers,
Simon

--

Research associate
Chair for database systems
Department of Informatics
TU München  Tel: +49 89 289 17276
Boltzmannstr. 3  E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany

On 19. Sep 2025, at 16:09, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Friday, September 19, 2025, Simon Ellmann <simon.ellmann@tum.de> wrote:
With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:

Working as documented in rule 6:


David J.