Mixing greediness in regexp_matches

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема Mixing greediness in regexp_matches
Дата
Msg-id 306b726b-f185-4668-bffe-ac8e7f78878e@manitou-mail.org
обсуждение исходный текст
Ответы Re: Mixing greediness in regexp_matches  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
  Hi,

When looking into how to implement a global replace of multiple
substrings (each with their own replacement) in sql
or plpgsql, I'm wondering if/how an RE with an alternation
can be used.

The basic idea is to iterate on the rows produced by

   regexp_matches(string, '(.*?)(foo|bar|foobar)', 'g')

to break down the string into pairs of (non-matching segment,
matching segment) so that a final result can be assembled
from that (setting aside the last non-matching segment, that
can be retrieved in a final step).

The difficulty is that the longest strings in the alternation
should be prioritized, but the starting (.*?) makes the RE
non-greedy so "foo" is choosen over "foobar".

The doc at [1] leaves me unoptimistic when it mentions that:

 "...when an RE contains both greedy and non-greedy
 subexpressions, the total match length is either as long as possible
 or as short as possible, according to the attribute assigned to the
 whole RE. The attributes assigned to the subexpressions only affect
 how much of that match they are allowed to “eat” relative to each
 other."

Also it gives this example of forcing the RE as a whole to
be greedy despite it having a non-greedy sub-RE:
 regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}')
but it doesn't seem to be able to produce the desired result
in the case of the RE in the middle being an alternation
with strings of different lengths.

The ideal RE with a (foo|foobar|bar) alternation, when applied
globally to a string like 'the string has foo and foobar and bar and
more' would produce something like:

 {"the string has ","foo"}
 {" and ","foobar"}
 {" and ","bar"}

Is that possible with regexp_matches?


[1] https://www.postgresql.org/docs/current/functions-matching.html



В списке pgsql-general по дате отправления:

Предыдущее
От: Thomas Kellerer
Дата:
Сообщение: Re: SQL operator '*='
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Mixing greediness in regexp_matches