Обсуждение: regexp_replace not respecting greediness
With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:
postgres=# SELECT REGEXP_REPLACE('jane.smith@example.com', '.*?@.*', 'ab');
regexp_replace
----------------
abexample.com
(1 row)
Other database systems (e.g., DuckDB, Umbra) match the whole input: https://analytics.db.in.tum.de/?q=SELECT+REGEXP_REPLACE%28%27jane.smith%40example.com%27%2C+%27.*%3F%40.*%27%2C+%27ab%27%29%3B
Affected version: PostgreSQL 17.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 15.2.1 20250813, 64-bit
Cheers,
Simon
Simon
--
Research associate
Chair for database systems
Department of Informatics
TU München Tel: +49 89 289 17276
Boltzmannstr. 3 E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany
Research associate
Chair for database systems
Department of Informatics
TU München Tel: +49 89 289 17276
Boltzmannstr. 3 E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany
On Friday, September 19, 2025, Simon Ellmann <simon.ellmann@tum.de> wrote:
With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:
Working as documented in rule 6:
David J.
Simon Ellmann <simon.ellmann@tum.de> writes: > With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should matchgreedily: > postgres=# SELECT REGEXP_REPLACE('jane.smith@example.com', '.*?@.*', 'ab'); This is correct according to the rules given at https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES specifically that "A branch — that is, an RE that has no top-level | operator — has the same greediness as the first quantified atom in it that has a greediness attribute." Because of that, the RE as a whole is non-greedy and will match the shortest not longest amount of text overall. The discussion in that manual section shows what to do when you don't like the results. > Other database systems (e.g., DuckDB, Umbra) match the whole input: If your complaint is "but it's not like Perl!", I suggest using a plperl function to do your regexp work. regards, tom lane
Thanks for the hint to the documentation, I missed that part. This is really surprising behavior!
Cheers,
Simon
Simon
--
Research associate
Chair for database systems
Department of Informatics
TU München Tel: +49 89 289 17276
Boltzmannstr. 3 E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany
Research associate
Chair for database systems
Department of Informatics
TU München Tel: +49 89 289 17276
Boltzmannstr. 3 E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany
On 19. Sep 2025, at 16:09, David G. Johnston <david.g.johnston@gmail.com> wrote:On Friday, September 19, 2025, Simon Ellmann <simon.ellmann@tum.de> wrote:With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:Working as documented in rule 6:David J.