Обсуждение: add function argument names to regex* functions.

Поиск
Список
Период
Сортировка

add function argument names to regex* functions.

От
jian he
Дата:
Hi.
similar to [1], add function argument names to the following functions:
regexp_like, regexp_match,regexp_matches,regexp_replace,
regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count

so I call these function in a different notation[2], like:

SELECT regexp_like(string=>'a'||CHR(10)||'d', pattern=>'a.d', flags:='n');
select regexp_match(string=>'abc',n pattern=>'(B)(c)', flags=>'i');
select regexp_matches(string=>'Programmer', pattern=>'(\w)(.*?\1)',
flags=>'ig');
SELECT regexp_replace(source=>'A PostgreSQL function',
pattern=>'a|e|i|o|u', replacement=>'X', start=>1, n=>4, flags=>'i');
SELECT regexp_substr(string=>'1234567890',
pattern=>'(123)(4(56)(78))', start=>1, n=>1, flags=>'i', subexpr=>4);
SELECT regexp_split_to_array(string=>'thE QUick bROWn FOx jUMPs ovEr
The lazy dOG', pattern=>'e', flags=>'i');

SELECT foo, length(foo)
FROM regexp_split_to_table(string=>'thE QUick bROWn FOx jUMPs ovEr The
lazy dOG', pattern=>'e',flags=>'i') AS foo;
SELECT regexp_count(string=>'ABCABCABCABC', pattern=>'Abc', start=>1,
flags=>'i');

In [3], except the above mentioned function, there is a "substring" function.
I want to refactor substring function argument names. it looks like:
   Schema   |   Name    | Result data type |            Argument data
types             | Type
------------+-----------+------------------+--------------------------------------------+------
 pg_catalog | substring | bit              | bits bit, "from" integer
                 | func
 pg_catalog | substring | bit              | bits bit, "from" integer,
"for" integer    | func
 pg_catalog | substring | bytea            | bytes bytea, "from"
integer                | func
 pg_catalog | substring | bytea            | bytes bytea, "from"
integer, "for" integer | func
 pg_catalog | substring | text             | string text, "from"
integer                | func
 pg_catalog | substring | text             | string text, "from"
integer, "for" integer | func
 pg_catalog | substring | text             | string text, pattern text
                 | func
 pg_catalog | substring | text             | text, text, text
                 | func
(8 rows)

As you can see, the substring function argument names need an explicit
double quote,
which doesn't look good, so I gave up.

[1]https://www.postgresql.org/message-id/flat/877cw3jl8y.fsf@wibble.ilmari.org
[2]https://www.postgresql.org/docs/current/sql-syntax-calling-funcs.html#SQL-SYNTAX-CALLING-FUNCS
[3] https://www.postgresql.org/docs/current/functions-matching.html
Вложения

Re: add function argument names to regex* functions.

От
Peter Eisentraut
Дата:
On 27.12.23 17:53, jian he wrote:
> similar to [1], add function argument names to the following functions:
> regexp_like, regexp_match,regexp_matches,regexp_replace,
> regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count

Note that these functions are a quasi-standard that is shared with other 
SQL implementations.  It might be worth looking around if there are 
already other implementations of this idea.




Re: add function argument names to regex* functions.

От
jian he
Дата:
On Thu, Dec 28, 2023 at 6:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 27.12.23 17:53, jian he wrote:
> > similar to [1], add function argument names to the following functions:
> > regexp_like, regexp_match,regexp_matches,regexp_replace,
> > regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count
>
> Note that these functions are a quasi-standard that is shared with other
> SQL implementations.  It might be worth looking around if there are
> already other implementations of this idea.
>

turns out people do like calling functions via explicitly mentioning
function argument names, example: [0]
There are no provisions for the argument names.

I looked around the oracle implementation in [1], and the oracle regex
related function argumentation name in [2]
I use the doc [3] syntax explanation and add the related function names.

Current, regex.* function syntax seems fine. but only parameter `N`
seems a little bit weird.
If we change the function's argument name, we also need to change
function syntax explanation in the doc; vise versa.

QUOTE:
The regexp_instr function returns the starting or ending position of
the N'th match of a POSIX regular expression pattern to a string, or
zero if there is no such match. It has the syntax regexp_instr(string,
pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]). pattern
is searched for in string, normally from the beginning of the string,
but if the start parameter is provided then beginning from that
character index. If N is specified then the N'th match of the pattern
is located, otherwise the first match is located.
END OF QUOTE.

maybe we can change `N` to occurrence. but `occurrence` is kind of verbose.

[0] https://stackoverflow.com/questions/33387348/oracle-named-parameters-in-regular-functions
[1]
https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099
[2] https://dbfiddle.uk/h_SBDEKi
[3] https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP



Re: add function argument names to regex* functions.

От
"Dian Fay"
Дата:
On Wed Dec 27, 2023 at 10:28 PM EST, jian he wrote:
> On Thu, Dec 28, 2023 at 6:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > On 27.12.23 17:53, jian he wrote:
> > > similar to [1], add function argument names to the following functions:
> > > regexp_like, regexp_match,regexp_matches,regexp_replace,
> > > regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count
> >
> > Note that these functions are a quasi-standard that is shared with other
> > SQL implementations.  It might be worth looking around if there are
> > already other implementations of this idea.
> >
>
> turns out people do like calling functions via explicitly mentioning
> function argument names, example: [0]
> There are no provisions for the argument names.
>
> I looked around the oracle implementation in [1], and the oracle regex
> related function argumentation name in [2]
> I use the doc [3] syntax explanation and add the related function names.
>
> Current, regex.* function syntax seems fine. but only parameter `N`
> seems a little bit weird.
> If we change the function's argument name, we also need to change
> function syntax explanation in the doc; vise versa.
>
> QUOTE:
> The regexp_instr function returns the starting or ending position of
> the N'th match of a POSIX regular expression pattern to a string, or
> zero if there is no such match. It has the syntax regexp_instr(string,
> pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]). pattern
> is searched for in string, normally from the beginning of the string,
> but if the start parameter is provided then beginning from that
> character index. If N is specified then the N'th match of the pattern
> is located, otherwise the first match is located.
> END OF QUOTE.
>
> maybe we can change `N` to occurrence. but `occurrence` is kind of verbose.
>
> [0] https://stackoverflow.com/questions/33387348/oracle-named-parameters-in-regular-functions
> [1]
https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099
> [2] https://dbfiddle.uk/h_SBDEKi
> [3] https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP

I've been trying to use named arguments more diligently so expanding
support for built-in functions is welcome. The patch applies cleanly and
works as advertised.

I agree that the parameter name `n` is not ideal. For example, in
`regexp_replace` it's easy to misinterpret it as "make up to n
replacements". This has not been a problem when `n` only lives in the
documentation which explains exactly what it does, but that context is
not readily available in code expressing `n => 3`.

Another possibility is `index`, which is relatively short and not a
reserved keyword ^1. `position` is not as precise but would avoid the
conceptual overloading of ordinary indices.

1. https://www.postgresql.org/docs/current/sql-keywords-appendix.html



Re: add function argument names to regex* functions.

От
Peter Eisentraut
Дата:
On 28.12.23 04:28, jian he wrote:
> I looked around the oracle implementation in [1], and the oracle regex
> related function argumentation name in [2]
> I use the doc [3] syntax explanation and add the related function names.
> 
> Current, regex.* function syntax seems fine. but only parameter `N`
> seems a little bit weird.
> If we change the function's argument name, we also need to change
> function syntax explanation in the doc; vise versa.

So, it looks like Oracle already has defined parameter names for these, 
so we should make ours match.




Re: add function argument names to regex* functions.

От
"Dian Fay"
Дата:
> > Another possibility is `index`, which is relatively short and not a
> > reserved keyword ^1. `position` is not as precise but would avoid the
> > conceptual overloading of ordinary indices.
>
> I'm not a fan of "index" since that leaves the question of
> whether it's 0 or 1 based. "Position" is a bit better, but I think
> Jian's suggestion of "occurance" is best.

We do have precedent for one-based `index` in Postgres: array types are
1-indexed by default! "Occurrence" removes that ambiguity but it's long
and easy to misspell (I looked it up after typing it just now and it
_still_ feels off).

How's "instance"?



Re: add function argument names to regex* functions.

От
jian he
Дата:
On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
>
> On 1/3/24 5:05 PM, Dian Fay wrote:
>
> Another possibility is `index`, which is relatively short and not a
> reserved keyword ^1. `position` is not as precise but would avoid the
> conceptual overloading of ordinary indices.
>
> I'm not a fan of "index" since that leaves the question of
> whether it's 0 or 1 based. "Position" is a bit better, but I think
> Jian's suggestion of "occurance" is best.
>
> We do have precedent for one-based `index` in Postgres: array types are
> 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> and easy to misspell (I looked it up after typing it just now and it
> _still_ feels off).
>
> How's "instance"?
>
> Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so
presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. 
>
> --
> Jim Nasby, Data Architect, Austin TX

regexp_instr: It has the syntax regexp_instr(string, pattern [, start
[, N [, endoption [, flags [, subexpr ]]]]])
oracle:
REGEXP_INSTR (source_char, pattern,  [, position [, occurrence [,
return_opt  [, match_param  [, subexpr ]]]]] )

"string" and "source_char" are almost the same descriptive, so maybe
there is no need to change.
"start" is better than "position", imho.
"return_opt" is better than "endoption", (maybe we need change, for
now I didn't)
"flags" cannot be changed to "match_param", given it quite everywhere
in functions-matching.html.

similarly for function regexp_replace, oracle using "repplace_string",
we use "replacement"(mentioned in the doc).
so I don't think we need to change to "repplace_string".

Based on how people google[0], I think `occurrence` is ok, even though
it's verbose.
to change from `N` to `occurrence`, we also need to change the doc,
that is why this patch is more larger.


[0]:
https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8

Вложения

Re: add function argument names to regex* functions.

От
"Dian Fay"
Дата:
On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
> On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
> >
> > On 1/3/24 5:05 PM, Dian Fay wrote:
> >
> > Another possibility is `index`, which is relatively short and not a
> > reserved keyword ^1. `position` is not as precise but would avoid the
> > conceptual overloading of ordinary indices.
> >
> > I'm not a fan of "index" since that leaves the question of
> > whether it's 0 or 1 based. "Position" is a bit better, but I think
> > Jian's suggestion of "occurance" is best.
> >
> > We do have precedent for one-based `index` in Postgres: array types are
> > 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> > and easy to misspell (I looked it up after typing it just now and it
> > _still_ feels off).
> >
> > How's "instance"?
> >
> > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so
presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. 
> >
> > --
> > Jim Nasby, Data Architect, Austin TX
>
> regexp_instr: It has the syntax regexp_instr(string, pattern [, start
> [, N [, endoption [, flags [, subexpr ]]]]])
> oracle:
> REGEXP_INSTR (source_char, pattern,  [, position [, occurrence [,
> return_opt  [, match_param  [, subexpr ]]]]] )
>
> "string" and "source_char" are almost the same descriptive, so maybe
> there is no need to change.
> "start" is better than "position", imho.
> "return_opt" is better than "endoption", (maybe we need change, for
> now I didn't)
> "flags" cannot be changed to "match_param", given it quite everywhere
> in functions-matching.html.
>
> similarly for function regexp_replace, oracle using "repplace_string",
> we use "replacement"(mentioned in the doc).
> so I don't think we need to change to "repplace_string".
>
> Based on how people google[0], I think `occurrence` is ok, even though
> it's verbose.
> to change from `N` to `occurrence`, we also need to change the doc,
> that is why this patch is more larger.
>
>
> [0]:
https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8

The `regexp_replace` summary in table 9.10 is mismatched and still
specifies the first parameter name as `string` instead of `source`.
Since all the other functions use `string`, should `regexp_replace` do
the same or is this a case where an established "standard" diverges?

I noticed the original documentation for some of these functions is
rather disorganized; summaries explain `occurrence` without explaining
the prior `start` parameter, and detailed documentation in 9.7 is
usually a single paragraph per function running pell-mell through ifs
and buts without section headings, so entries in table 9.10 have to
reference the entire section 9.7.3 instead of their specific functions.
It's out of scope here, but should I bring this up on pgsql-docs?



Re: add function argument names to regex* functions.

От
jian he
Дата:
On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
>
> On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
> > On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
> > >
> > > On 1/3/24 5:05 PM, Dian Fay wrote:
> > >
> > > Another possibility is `index`, which is relatively short and not a
> > > reserved keyword ^1. `position` is not as precise but would avoid the
> > > conceptual overloading of ordinary indices.
> > >
> > > I'm not a fan of "index" since that leaves the question of
> > > whether it's 0 or 1 based. "Position" is a bit better, but I think
> > > Jian's suggestion of "occurance" is best.
> > >
> > > We do have precedent for one-based `index` in Postgres: array types are
> > > 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> > > and easy to misspell (I looked it up after typing it just now and it
> > > _still_ feels off).
> > >
> > > How's "instance"?
> > >
> > > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so
presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. 
> > >
> > > --
> > > Jim Nasby, Data Architect, Austin TX
> >
> > regexp_instr: It has the syntax regexp_instr(string, pattern [, start
> > [, N [, endoption [, flags [, subexpr ]]]]])
> > oracle:
> > REGEXP_INSTR (source_char, pattern,  [, position [, occurrence [,
> > return_opt  [, match_param  [, subexpr ]]]]] )
> >
> > "string" and "source_char" are almost the same descriptive, so maybe
> > there is no need to change.
> > "start" is better than "position", imho.
> > "return_opt" is better than "endoption", (maybe we need change, for
> > now I didn't)
> > "flags" cannot be changed to "match_param", given it quite everywhere
> > in functions-matching.html.
> >
> > similarly for function regexp_replace, oracle using "repplace_string",
> > we use "replacement"(mentioned in the doc).
> > so I don't think we need to change to "repplace_string".
> >
> > Based on how people google[0], I think `occurrence` is ok, even though
> > it's verbose.
> > to change from `N` to `occurrence`, we also need to change the doc,
> > that is why this patch is more larger.
> >
> >
> > [0]:
https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8
>
> The `regexp_replace` summary in table 9.10 is mismatched and still
> specifies the first parameter name as `string` instead of `source`.
> Since all the other functions use `string`, should `regexp_replace` do
> the same or is this a case where an established "standard" diverges?
>

got it. Thanks for pointing it out.

in functions-matching.html
if I change <replaceable>source</replaceable> to
<replaceable>string</replaceable> then
there are no markup "string" and markup "string", it's kind of
slightly confusing.

So does the following refactored description of regexp_replace make sense:

     The <replaceable>string</replaceable> is returned unchanged if
     there is no match to the <replaceable>pattern</replaceable>.  If there is a
     match, the <replaceable>string</replaceable> is returned with the
     <replaceable>replacement</replaceable> string substituted for the matching
     substring.  The <replaceable>replacement</replaceable> string can contain
     <literal>\</literal><replaceable>n</replaceable>, where
<replaceable>n</replaceable> is 1
     through 9, to indicate that the source substring matching the
     <replaceable>n</replaceable>'th parenthesized subexpression of
the pattern should be
     inserted, and it can contain <literal>\&</literal> to indicate that the
     substring matching the entire pattern should be inserted.  Write
     <literal>\\</literal> if you need to put a literal backslash in
the replacement
     text.

> I noticed the original documentation for some of these functions is
> rather disorganized; summaries explain `occurrence` without explaining
> the prior `start` parameter, and detailed documentation in 9.7 is
> usually a single paragraph per function running pell-mell through ifs
> and buts without section headings, so entries in table 9.10 have to
> reference the entire section 9.7.3 instead of their specific functions.
> It's out of scope here, but should I bring this up on pgsql-docs?

I got it.
in Table 9.10. Other String Functions and Operators, if we can
reference the specific function would be great.
As for now, in the browser, you need to use Ctrl+F to find the
detailed explanation in 9.7.3.
you can just bring your suggested or patch to pgsql-hackers@postgresql.org.



Re: add function argument names to regex* functions.

От
"Dian Fay"
Дата:
On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
> On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
> > The `regexp_replace` summary in table 9.10 is mismatched and still
> > specifies the first parameter name as `string` instead of `source`.
> > Since all the other functions use `string`, should `regexp_replace` do
> > the same or is this a case where an established "standard" diverges?
>
> got it. Thanks for pointing it out.
>
> in functions-matching.html
> if I change <replaceable>source</replaceable> to
> <replaceable>string</replaceable> then
> there are no markup "string" and markup "string", it's kind of
> slightly confusing.
>
> So does the following refactored description of regexp_replace make sense:
>
>      The <replaceable>string</replaceable> is returned unchanged if
>      there is no match to the <replaceable>pattern</replaceable>.  If there is a
>      match, the <replaceable>string</replaceable> is returned with the
>      <replaceable>replacement</replaceable> string substituted for the matching
>      substring.  The <replaceable>replacement</replaceable> string can contain
>      <literal>\</literal><replaceable>n</replaceable>, where
> <replaceable>n</replaceable> is 1
>      through 9, to indicate that the source substring matching the
>      <replaceable>n</replaceable>'th parenthesized subexpression of
> the pattern should be
>      inserted, and it can contain <literal>\&</literal> to indicate that the
>      substring matching the entire pattern should be inserted.  Write
>      <literal>\\</literal> if you need to put a literal backslash in
> the replacement
>      text.

That change makes sense to me! I'll see about the section refactoring
after this lands.



Re: add function argument names to regex* functions.

От
jian he
Дата:
On Tue, Jan 9, 2024 at 8:52 AM Dian Fay <di@nmfay.com> wrote:
>
> On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
> > On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
> > > The `regexp_replace` summary in table 9.10 is mismatched and still
> > > specifies the first parameter name as `string` instead of `source`.
> > > Since all the other functions use `string`, should `regexp_replace` do
> > > the same or is this a case where an established "standard" diverges?
> >
> > got it. Thanks for pointing it out.
> >
> > in functions-matching.html
> > if I change <replaceable>source</replaceable> to
> > <replaceable>string</replaceable> then
> > there are no markup "string" and markup "string", it's kind of
> > slightly confusing.
> >
> > So does the following refactored description of regexp_replace make sense:
> >
> >      The <replaceable>string</replaceable> is returned unchanged if
> >      there is no match to the <replaceable>pattern</replaceable>.  If there is a
> >      match, the <replaceable>string</replaceable> is returned with the
> >      <replaceable>replacement</replaceable> string substituted for the matching
> >      substring.  The <replaceable>replacement</replaceable> string can contain
> >      <literal>\</literal><replaceable>n</replaceable>, where
> > <replaceable>n</replaceable> is 1
> >      through 9, to indicate that the source substring matching the
> >      <replaceable>n</replaceable>'th parenthesized subexpression of
> > the pattern should be
> >      inserted, and it can contain <literal>\&</literal> to indicate that the
> >      substring matching the entire pattern should be inserted.  Write
> >      <literal>\\</literal> if you need to put a literal backslash in
> > the replacement
> >      text.
>
> That change makes sense to me! I'll see about the section refactoring
> after this lands.

I put the changes into the new patch.

Вложения

Re: add function argument names to regex* functions.

От
"Dian Fay"
Дата:
On Wed Jan 10, 2024 at 9:18 AM EST, jian he wrote:
> On Tue, Jan 9, 2024 at 8:52 AM Dian Fay <di@nmfay.com> wrote:
> >
> > On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
> > > On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
> > > > The `regexp_replace` summary in table 9.10 is mismatched and still
> > > > specifies the first parameter name as `string` instead of `source`.
> > > > Since all the other functions use `string`, should `regexp_replace` do
> > > > the same or is this a case where an established "standard" diverges?
> > >
> > > got it. Thanks for pointing it out.
> > >
> > > in functions-matching.html
> > > if I change <replaceable>source</replaceable> to
> > > <replaceable>string</replaceable> then
> > > there are no markup "string" and markup "string", it's kind of
> > > slightly confusing.
> > >
> > > So does the following refactored description of regexp_replace make sense:
> > >
> > >      The <replaceable>string</replaceable> is returned unchanged if
> > >      there is no match to the <replaceable>pattern</replaceable>.  If there is a
> > >      match, the <replaceable>string</replaceable> is returned with the
> > >      <replaceable>replacement</replaceable> string substituted for the matching
> > >      substring.  The <replaceable>replacement</replaceable> string can contain
> > >      <literal>\</literal><replaceable>n</replaceable>, where
> > > <replaceable>n</replaceable> is 1
> > >      through 9, to indicate that the source substring matching the
> > >      <replaceable>n</replaceable>'th parenthesized subexpression of
> > > the pattern should be
> > >      inserted, and it can contain <literal>\&</literal> to indicate that the
> > >      substring matching the entire pattern should be inserted.  Write
> > >      <literal>\\</literal> if you need to put a literal backslash in
> > > the replacement
> > >      text.
> >
> > That change makes sense to me! I'll see about the section refactoring
> > after this lands.
>
> I put the changes into the new patch.

Sorry, I missed one minor issue with v2. The replacement on lines
6027-6028 of func.sgml (originally "`n` rows if there are `n` matches")
is not needed and could be more confusing since the `n` represents a
number, not an argument to `regexp_matches`. I've built v3 and gone over
everything else one more time and it looks good.



Re: add function argument names to regex* functions.

От
Peter Eisentraut
Дата:
On 10.01.24 15:18, jian he wrote:
> I put the changes into the new patch.

Reading back through the discussion, I wasn't quite able to interpret 
the resolution regarding Oracle compatibility.  From the patch, it looks 
like you chose not to adopt the parameter names from Oracle.  Was that 
your intention?




Re: add function argument names to regex* functions.

От
jian he
Дата:
On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 10.01.24 15:18, jian he wrote:
> > I put the changes into the new patch.
>
> Reading back through the discussion, I wasn't quite able to interpret
> the resolution regarding Oracle compatibility.  From the patch, it looks
> like you chose not to adopt the parameter names from Oracle.  Was that
> your intention?
>

per committee message:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
Even if the names are all the same, our function is still not the same
as oracle.

There is a documentation bug.
In [0], Table 9.25. Regular Expression Functions Equivalencies
regexp_replace function definition: regexp_replace(string, pattern, replacement)

In one of the <tip> section below, regexp_replace explains as
<<<<<
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, start [, N ]] [,
flags ]). (Notice that N cannot be specified unless start is, but
flags can be given in any case.)
<<<<<
So I changed the first argument of regexp_replace to "string". So
accordingly, the doc needs to change also, which I did.

another regex* function argument changes: from "N" to "occurences",  example:
    + If <replaceable>occurrence</replaceable> is specified
    + then the <replaceable>occurrence</replaceable>'th match of the pattern
    + is located,

but [2] says
Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell."

summary:
adding function-named notation is my intention.
To make regex.* functions named-notation works, we need to add
proargnames to src/include/catalog/pg_proc.dat.
add proargnames also require changing the doc.
naming proargnames is a matter of taste now, So I only change 'N' to
'occurrence'.

[0] https://www.postgresql.org/docs/current/functions-matching.html
[1] https://www.postgresql.org/message-id/flat/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net



Re: add function argument names to regex* functions.

От
jian he
Дата:
On Sat, Jan 20, 2024 at 10:55 AM jian he <jian.universality@gmail.com> wrote:
>
>
> another regex* function argument changes: from "N" to "occurences",  example:
>     + If <replaceable>occurrence</replaceable> is specified
>     + then the <replaceable>occurrence</replaceable>'th match of the pattern
>     + is located,
>
> but [2] says
> Speaking of the "occurrence'th
> occurrence" is just silly, not to mention long and easy to misspell."
>

sorry.
[2], The reference link is
https://www.postgresql.org/message-id/1567465.1627860115%40sss.pgh.pa.us

my previous post will link to the whole thread.



Re: add function argument names to regex* functions.

От
Tom Lane
Дата:
jian he <jian.universality@gmail.com> writes:
> On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>> Reading back through the discussion, I wasn't quite able to interpret
>> the resolution regarding Oracle compatibility.  From the patch, it looks
>> like you chose not to adopt the parameter names from Oracle.  Was that
>> your intention?

> per committee message:
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
> Even if the names are all the same, our function is still not the same
> as oracle.

The fact that there's minor discrepancies in the regex languages
doesn't seem to me to have a lot of bearing on whether we should
follow Oracle's choices of parameter names.

However, if we do follow Oracle, it seems like we should do that
consistently, which this patch doesn't.  For instance, per [1]
Oracle calls the arguments of regex_substr

    source_char,
    pattern,
    position,
    occurrence,
    match_param,
    subexpr

while we have

    string,
    pattern,
    start,
    N,
    flags,
    subexpr

The patch proposes to replace "N" with "occurrence" but not touch
the other discrepancies, which seems to me to be a pretty poor
choice.  "occurrence" is very long and difficult to spell correctly,
and if you're not following Oracle slavishly, exactly what is the
argument in its favor?  I quite agree that Oracle's other choices
aren't improvements over ours, but neither is that one.

On the whole my inclination would be to stick to the names we have
in the documentation.  There might be an argument for changing "N"
to something lower-case so you don't have to quote it; but if we do,
I'd go for, say, "count".

            regards, tom lane

[1]
https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099



Re: add function argument names to regex* functions.

От
jian he
Дата:
On Wed, Apr 3, 2024 at 4:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> jian he <jian.universality@gmail.com> writes:
> > On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> >> Reading back through the discussion, I wasn't quite able to interpret
> >> the resolution regarding Oracle compatibility.  From the patch, it looks
> >> like you chose not to adopt the parameter names from Oracle.  Was that
> >> your intention?
>
> > per committee message:
> > https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
> > Even if the names are all the same, our function is still not the same
> > as oracle.
>
> The fact that there's minor discrepancies in the regex languages
> doesn't seem to me to have a lot of bearing on whether we should
> follow Oracle's choices of parameter names.
>
> However, if we do follow Oracle, it seems like we should do that
> consistently, which this patch doesn't.  For instance, per [1]
> Oracle calls the arguments of regex_substr
>
>         source_char,
>         pattern,
>         position,
>         occurrence,
>         match_param,
>         subexpr
>
> while we have
>
>         string,
>         pattern,
>         start,
>         N,
>         flags,
>         subexpr
>
> The patch proposes to replace "N" with "occurrence" but not touch
> the other discrepancies, which seems to me to be a pretty poor
> choice.  "occurrence" is very long and difficult to spell correctly,
> and if you're not following Oracle slavishly, exactly what is the
> argument in its favor?  I quite agree that Oracle's other choices
> aren't improvements over ours, but neither is that one.
>
> On the whole my inclination would be to stick to the names we have
> in the documentation.  There might be an argument for changing "N"
> to something lower-case so you don't have to quote it; but if we do,
> I'd go for, say, "count".
>

we have
---------------------------------------------------------------
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted.
----------------------------------------------------------------------------
in the regexp_replace explanation section.
changing "N" to lower-case would be misleading for regexp_replace?
so I choose "count".

By the way, I think the above  is so hard to comprehend.
I can only find related test in src/test/regress/sql/strings.sql are:
SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})',
E'(\\1) \\2-\\3');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\&Y', 'g');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\\\Y', 'g');

but these tests seem not friendly.
maybe we should have some simple examples to demonstrate the above paragraph.

Вложения

Re: add function argument names to regex* functions.

От
Robert Haas
Дата:
On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
> in the regexp_replace explanation section.
> changing "N" to lower-case would be misleading for regexp_replace?
> so I choose "count".

I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:

+     then the <replaceable>count</replaceable>'th match of the pattern

An English speaker is more likely to understand what is meant by
"N'th" than what is meant by "count'th". Even if they can guess, it's
kinda strange-looking. I think it needs to be rephrased somehow, but
I'm not sure exactly how.

> By the way, I think the above  is so hard to comprehend.
> I can only find related test in src/test/regress/sql/strings.sql are:
> SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})',
> E'(\\1) \\2-\\3');
> SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\&Y', 'g');
> SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\\\Y', 'g');
>
> but these tests seem not friendly.
> maybe we should have some simple examples to demonstrate the above paragraph.

Examples in the regression tests aren't meant as tests, not examples
for users to copy. If we want examples, those belong in the
documentation. However, I see that regexp_replace already has some
examples at https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
so I'm not sure exactly what you think should be added.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: add function argument names to regex* functions.

От
Robert Haas
Дата:
On Wed, May 15, 2024 at 2:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Examples in the regression tests aren't meant as tests, not examples
> for users to copy. If we want examples, those belong in the
> documentation. However, I see that regexp_replace already has some
> examples at https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
> so I'm not sure exactly what you think should be added.

Woops. I should have said: Examples in the regression tests *are*
meant as tests...

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: add function argument names to regex* functions.

От
"David G. Johnston"
Дата:
On Wed, May 15, 2024 at 11:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
> in the regexp_replace explanation section.
> changing "N" to lower-case would be misleading for regexp_replace?
> so I choose "count".

I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:

+     then the <replaceable>count</replaceable>'th match of the pattern

An English speaker is more likely to understand what is meant by
"N'th" than what is meant by "count'th". Even if they can guess, it's
kinda strange-looking. I think it needs to be rephrased somehow, but
I'm not sure exactly how.


I think this confusion goes to show that replacing N with count doesn't work.

"replace_at" comes to mind as a better name.

By default, only the first match of the pattern is replaced.  If replace_at is specified and greater than zero, then the first "replace_at - 1" matches are skipped before making a single replacement (i.e., the g flag is ignored when replace_at is specified.)

David J.

Re: add function argument names to regex* functions.

От
Robert Haas
Дата:
On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I think this confusion goes to show that replacing N with count doesn't work.
>
> "replace_at" comes to mind as a better name.

I do not agree with that at all. It shows that a literal
search-and-replace changing N to count does not work, but it does not
show that count is a bad name for the concept, and I don't think it
is. I believe that if I were reading the documentation, count would be
clearer to me than N, N would probably still be clear enough, and
replace_at wouldn't be clear at all. I'd expect replace_at to be a
character position or something, not an occurrence count.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: add function argument names to regex* functions.

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
>> changing "N" to lower-case would be misleading for regexp_replace?
>> so I choose "count".

> I don't see why that would be confusing for regexp_replace
> specifically, but I think N => count is a reasonable change to make.
> However, I don't think this quite works:
> +     then the <replaceable>count</replaceable>'th match of the pattern

I think the origin of the problem here is not wanting to use "N"
as the actual name of the parameter, because then users would have
to double-quote it to write "regexp_replace(..., "N" => 42, ...)".

However ... is that really so awful?  It's still fewer keystrokes
than "count".  It's certainly a potential gotcha for users who've
not internalized when they need double quotes, but I think we
could largely address that problem just by making sure to provide
a documentation example that shows use of "N".

> An English speaker is more likely to understand what is meant by
> "N'th" than what is meant by "count'th".

+1 ... none of the proposals make that bit read more clearly
than it does now.

            regards, tom lane



Re: add function argument names to regex* functions.

От
Chapman Flack
Дата:
On 05/15/24 15:07, Robert Haas wrote:
> is. I believe that if I were reading the documentation, count would be
> clearer to me than N, N would probably still be clear enough, and
> replace_at wouldn't be clear at all. I'd expect replace_at to be a
> character position or something, not an occurrence count.

You've said the magic word. In the analogous (but XQuery-based)
ISO standard regex functions, the argument that does that is identified
with the keyword OCCURRENCE.

What would be wrong with that, for consistency's sake?

Regards,
-Chap



Re: add function argument names to regex* functions.

От
"David G. Johnston"
Дата:
On Wed, May 15, 2024 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I think this confusion goes to show that replacing N with count doesn't work.
>
> "replace_at" comes to mind as a better name.

I do not agree with that at all. It shows that a literal
search-and-replace changing N to count does not work, but it does not
show that count is a bad name for the concept, and I don't think it
is. I believe that if I were reading the documentation, count would be
clearer to me than N, N would probably still be clear enough, and
replace_at wouldn't be clear at all. I'd expect replace_at to be a
character position or something, not an occurrence count.


The function replaces matches, not random characters.  And if you are reading the documentation I find it implausible that the wording I suggested would cause one to think in terms of characters instead of matches.

If I choose not to read the documentation "count" seems like it behaves as a qualified "g".  I don't want all matches replaced, I want the first "count" matches only replaced.

"occurrence" probably is the best choice but I agree the spelling issues are a big negative.

count - how many things there are.  This isn't a count.  I'd rather stick with N, at least it actually has the desired meaning as a pointer to an item in a list.

N - The label provides zero context as to what the number you place there is going to be used for.  Labels ideally do more work than this especially if someone takes the time to spell them out.  Otherwise why use "pattern" instead of "p".

David J.

Re: add function argument names to regex* functions.

От
Robert Haas
Дата:
On Wed, May 15, 2024 at 3:23 PM Chapman Flack <jcflack@acm.org> wrote:
> What would be wrong with that, for consistency's sake?

It was proposed and rejected upthread, but that's not to say that I
necessarily endorse the reasons given.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: add function argument names to regex* functions.

От
Robert Haas
Дата:
On Wed, May 15, 2024 at 3:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> The function replaces matches, not random characters.  And if you are reading the documentation I find it implausible
thatthe wording I suggested would cause one to think in terms of characters instead of matches. 

I mean I just told you what my reaction to it was. If you find that
reaction "implausible" then I guess you think I was lying when I said
that?

> N - The label provides zero context as to what the number you place there is going to be used for.  Labels ideally do
morework than this especially if someone takes the time to spell them out.  Otherwise why use "pattern" instead of "p". 

I feel like you're attacking a straw man here. I never said that N was
my first choice; in fact, I said the opposite. But I do think that if
the documentation says, as it does, that the function is
regexp_replace(source, pattern, replacement, start, N, flags), a
reader who has some idea what a function called regexp_replace might
do will probably be able to guess what N is. It's probably also true
that if we changed "pattern" to "p" they would still be able to guess
that too, because there's nothing other than a pattern that you'd
expect to pass to a regexp-replacement function that starts with p,
but it would still be worse than what we have now.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: add function argument names to regex* functions.

От
"David G. Johnston"
Дата:
On Wed, May 15, 2024 at 12:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> The function replaces matches, not random characters.  And if you are reading the documentation I find it implausible that the wording I suggested would cause one to think in terms of characters instead of matches.

I mean I just told you what my reaction to it was. If you find that
reaction "implausible" then I guess you think I was lying when I said
that?


You just broke my brain when you say that you read:

By default, only the first match of the pattern is replaced.  If replace_at is specified and greater than zero, then the first "replace_at - 1" matches are skipped before making a single replacement (i.e., the g flag is ignored when replace_at is specified.)

And then say:

I'd expect replace_at to be a character position or something, not an occurrence count.

I guess it isn't a claim you are lying, rather I simply don't follow your mental model of all this and in my mental model behind the proposal I don't believe the typical reader will become confused on that point.  I guess that means I don't find you to be the typical reader, at least so far as this specific topic goes.  But hey, maybe I'm the one in the minority.  In either case we disagree and that was my main point.

David J.

Re: add function argument names to regex* functions.

От
"David G. Johnston"
Дата:
On Wed, May 15, 2024 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I think this confusion goes to show that replacing N with count doesn't work.
>
> "replace_at" comes to mind as a better name.
I'd expect replace_at to be a
character position or something, not an occurrence count.


I'll amend the name to:  "replace_match"

I do now see that since the immediately preceding parameter, "start", deals with characters instead of matches that making it clear this parameter deals in matches in the name work.  The singular 'match' has all the same benefits as 'at' plus this point of clarity.


David J.

Re: add function argument names to regex* functions.

От
Robert Haas
Дата:
On Wed, May 15, 2024 at 4:13 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> You just broke my brain when you say that you read:
>
> By default, only the first match of the pattern is replaced.  If replace_at is specified and greater than zero, then
thefirst "replace_at - 1" matches are skipped before making a single replacement (i.e., the g flag is ignored when
replace_atis specified.) 
>
> And then say:
>
> I'd expect replace_at to be a character position or something, not an occurrence count.

Ah. What I meant was: if I just saw the parameter name, and not the
documentation, I believe that I would not correctly understand what it
did. I would have had to read the docs. Whereas I'm pretty sure at
some point years ago, I looked up these functions and I saw "N", and I
did understand what that did without needing it explained. If I had
seen "count" or "occurrence" I think I would have understood that
without further explanation, too.

So my point was: to me, N is more self-documenting than replace_at,
and less self-documenting than count or occurrence.

If your mileage varies on that point, so be it!

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: add function argument names to regex* functions.

От
"David G. Johnston"
Дата:
On Wed, May 15, 2024 at 1:19 PM Robert Haas <robertmhaas@gmail.com> wrote:

So my point was: to me, N is more self-documenting than replace_at,
and less self-documenting than count or occurrence.

If your mileage varies on that point, so be it!


Maybe just "match" instead of "replace_match".

Reading this it strikes me that any of these parameter names can and probably should be read as having "replace" in front of them:

replace N
replace count
replace occurrence
replace match

Saying replace becomes redundant:
replace replace at
replace replace match

David J.

Re: add function argument names to regex* functions.

От
Chapman Flack
Дата:
On 05/15/24 15:31, Robert Haas wrote:
> On Wed, May 15, 2024 at 3:23 PM Chapman Flack <jcflack@acm.org> wrote:
>> What would be wrong with [occurrence], for consistency's sake?
>
> It was proposed and rejected upthread, but that's not to say that I
> necessarily endorse the reasons given.

Apologies for not having read far enough up the thread before replying.

Having done so now, I guess I'd just offer one small point: the upthread
discussion did mention that 'occurrence' was used by Oracle, and asked
"if you're not following Oracle slavishly, exactly what is the argument
in its favor?".

Nothing else upthread seems to have mentioned that OCCURRENCE is the
exact keyword used in ISO SQL for the analogous argument in analogous
functions. Maybe that won't have any effect on the outcome either, but
it does seem worth getting into the thread.

Regards,
-Chap



Re: add function argument names to regex* functions.

От
Robert Haas
Дата:
On Wed, May 15, 2024 at 4:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Maybe just "match" instead of "replace_match".

Well, this is just turning into a bikeshedding exercise at this point.
We can generate names for this parameter all day long, but a bunch of
names none of which gets more than one vote is not really helping
anything.

--
Robert Haas
EDB: http://www.enterprisedb.com