Обсуждение: Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

Поиск
Список
Период
Сортировка

Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

От
Tom Lane
Дата:
Florents Tselai <florents.tselai@gmail.com> writes:
> This patch is a follow-up and generalization to [0].
> It adds the following jsonpath methods:  lower, upper, initcap, l/r/btrim,
> replace, split_part.

How are you going to deal with the fact that this makes jsonpath
operations not guaranteed immutable?  (See commit cb599b9dd
for some context.)  Those are all going to have behavior that's
dependent on the underlying locale.

We have the kluge of having separate "_tz" functions to support
non-immutable datetime operations, but that way doesn't seem like
it's going to scale well to multiple sources of mutability.

            regards, tom lane



Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

От
Alexander Korotkov
Дата:
On Thu, Sep 26, 2024 at 12:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florents Tselai <florents.tselai@gmail.com> writes:
> > This patch is a follow-up and generalization to [0].
> > It adds the following jsonpath methods:  lower, upper, initcap, l/r/btrim,
> > replace, split_part.
>
> How are you going to deal with the fact that this makes jsonpath
> operations not guaranteed immutable?  (See commit cb599b9dd
> for some context.)  Those are all going to have behavior that's
> dependent on the underlying locale.
>
> We have the kluge of having separate "_tz" functions to support
> non-immutable datetime operations, but that way doesn't seem like
> it's going to scale well to multiple sources of mutability.

While inventing "_tz" functions I was thinking about jsonpath methods
and operators defined in standard then.  Now I see huge interest on
extending that.  I wonder if we can introduce a notion of flexible
mutability?  Imagine that jsonb_path_query() function (and others) has
another function which analyzes arguments and reports mutability.  If
jsonpath argument is constant and all methods inside are safe then
jsonb_path_query() is immutable otherwise it is stable.  I was
thinking about that back working on jsonpath, but that time problem
seemed too limited for this kind of solution.  Now, it's possibly time
to shake off the dust from this idea.  What do you think?

------
Regards,
Alexander Korotkov
Supabase



Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

От
Florents Tselai
Дата:


On Thu, Sep 26, 2024 at 1:55 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Sep 26, 2024 at 12:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florents Tselai <florents.tselai@gmail.com> writes:
> > This patch is a follow-up and generalization to [0].
> > It adds the following jsonpath methods:  lower, upper, initcap, l/r/btrim,
> > replace, split_part.
>
> How are you going to deal with the fact that this makes jsonpath
> operations not guaranteed immutable?  (See commit cb599b9dd
> for some context.)  Those are all going to have behavior that's
> dependent on the underlying locale.
>
> We have the kluge of having separate "_tz" functions to support
> non-immutable datetime operations, but that way doesn't seem like
> it's going to scale well to multiple sources of mutability.

While inventing "_tz" functions I was thinking about jsonpath methods
and operators defined in standard then.  Now I see huge interest on
extending that.  I wonder if we can introduce a notion of flexible
mutability?  Imagine that jsonb_path_query() function (and others) has
another function which analyzes arguments and reports mutability.  If
jsonpath argument is constant and all methods inside are safe then
jsonb_path_query() is immutable otherwise it is stable.  I was
thinking about that back working on jsonpath, but that time problem
seemed too limited for this kind of solution.  Now, it's possibly time
to shake off the dust from this idea.  What do you think?

------
Regards,
Alexander Korotkov
Supabase

In case you're having a deja vu, while researching this  
I did come across [0] where disussing this back in 2019.

In this patch I've conveniently left jspIsMutable and jspIsMutableWalker untouched and under the rug,
but for the few seconds I pondered over this,the best answer I came with was 
a simple heuristic to what Alexander says above:
if all elements are safe, then the whole jsp is immutable.

If we really want to tackle this and make jsonpath richer though, 
I don't think we can avoid being a little more flexible/explicit wrt mutability.

Speaking of extensible: the jsonpath standard does mention function extensions [1] ,
so it looks like we're covered by the standard, and the mutability aspect is an implementation detail. No?
And having said that, the whole jsonb/jsonpath parser/executor infrastructure is extremely powerful
and kinda under-utilized if we use it "only" for jsonpath.
Tbh, I can see it supporting more specific DSLs and even offering hooks for extensions.
And I know for certain I'm not the only one thinking about this.
See [2] for example where they've lifted, shifted and renamed the jsonb/jsonpath infra to build a separate language for graphs

Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

От
"David E. Wheeler"
Дата:
On Sep 26, 2024, at 13:59, Florents Tselai <florents.tselai@gmail.com> wrote:

> Speaking of extensible: the jsonpath standard does mention function extensions [1] ,
> so it looks like we're covered by the standard, and the mutability aspect is an implementation detail. No?

That’s not the standard used for Postgres jsonpath. Postgres follows the SQL/JSON standard in the SQL standard, which
isnot publicly available, but a few people on the list have copies they’ve purchased and so could provide some context. 

In a previous post I wondered if the SQL standard had some facility for function extensions, but I suspect not. Maybe
inthe next iteration? 

> And having said that, the whole jsonb/jsonpath parser/executor infrastructure is extremely powerful
> and kinda under-utilized if we use it "only" for jsonpath.
> Tbh, I can see it supporting more specific DSLs and even offering hooks for extensions.
> And I know for certain I'm not the only one thinking about this.
> See [2] for example where they've lifted, shifted and renamed the jsonb/jsonpath infra to build a separate language
forgraphs 

I’m all for extensibility, though jsonpath does need to continue to comply with the SQL standard. Do you have some idea
ofthe sorts of hooks that would allow extension authors to use some of that underlying capability? 

Best,

David




Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

От
Florents Tselai
Дата:

> On 27 Sep 2024, at 12:45 PM, David E. Wheeler <david@justatheory.com> wrote:
>
> On Sep 26, 2024, at 13:59, Florents Tselai <florents.tselai@gmail.com> wrote:
>
>> Speaking of extensible: the jsonpath standard does mention function extensions [1] ,
>> so it looks like we're covered by the standard, and the mutability aspect is an implementation detail. No?
>
> That’s not the standard used for Postgres jsonpath. Postgres follows the SQL/JSON standard in the SQL standard, which
isnot publicly available, but a few people on the list have copies they’ve purchased and so could provide some context. 
>
> In a previous post I wondered if the SQL standard had some facility for function extensions, but I suspect not. Maybe
inthe next iteration? 
>
>> And having said that, the whole jsonb/jsonpath parser/executor infrastructure is extremely powerful
>> and kinda under-utilized if we use it "only" for jsonpath.
>> Tbh, I can see it supporting more specific DSLs and even offering hooks for extensions.
>> And I know for certain I'm not the only one thinking about this.
>> See [2] for example where they've lifted, shifted and renamed the jsonb/jsonpath infra to build a separate language
forgraphs 
>
> I’m all for extensibility, though jsonpath does need to continue to comply with the SQL standard. Do you have some
ideaof the sorts of hooks that would allow extension authors to use some of that underlying capability? 

Re-tracing what I had to do

1. Define a new JsonPathItemType jpiMyExtType and map it to a JsonPathKeyword
2. Add a new JsonPathKeyword and make the lexer and parser aware of that,
3. Tell the main executor executeItemOptUnwrapTarget what to do when the new type is matched.

I think 1, 2 are the trickiest because they require hooks to  jsonpath_scan.l and parser jsonpath_gram.y

3. is the meat of a potential hook, which would be something like
extern JsonPathExecResult executeOnMyJsonpathItem(JsonPathExecContext *cxt, JsonbValue *jb, JsonValueList *found);
This should be called by the main executor executeItemOptUnwrapTarget when it encounters case jpiMyExtType

It looks like quite an endeavor, to be honest.




On Thu, Sep 26, 2024 at 1:55 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Sep 26, 2024 at 12:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florents Tselai <florents.tselai@gmail.com> writes:
> > This patch is a follow-up and generalization to [0].
> > It adds the following jsonpath methods:  lower, upper, initcap, l/r/btrim,
> > replace, split_part.
>
> How are you going to deal with the fact that this makes jsonpath
> operations not guaranteed immutable?  (See commit cb599b9dd
> for some context.)  Those are all going to have behavior that's
> dependent on the underlying locale.
>
> We have the kluge of having separate "_tz" functions to support
> non-immutable datetime operations, but that way doesn't seem like
> it's going to scale well to multiple sources of mutability.

While inventing "_tz" functions I was thinking about jsonpath methods
and operators defined in standard then.  Now I see huge interest on
extending that.  I wonder if we can introduce a notion of flexible
mutability?  Imagine that jsonb_path_query() function (and others) has
another function which analyzes arguments and reports mutability.  If
jsonpath argument is constant and all methods inside are safe then
jsonb_path_query() is immutable otherwise it is stable.  I was
thinking about that back working on jsonpath, but that time problem
seemed too limited for this kind of solution.  Now, it's possibly time
to shake off the dust from this idea.  What do you think?

I was thinking about taking another stab at this.
Would someone more versed in the inner workings of jsonpath like to weigh in on the immutability wrt locale?
On Wed, Mar 5, 2025 at 2:30 PM Florents Tselai
<florents.tselai@gmail.com> wrote:
> I was thinking about taking another stab at this.
> Would someone more versed in the inner workings of jsonpath like to weigh in on the immutability wrt locale?

I'm not sure the issues with immutability here are particularly
related to jsonpath -- I think they may just be general problems with
our framework for immutability.

I always struggle a bit to remember our policy on these issues -- to
the best of my knowledge, we haven't documented it anywhere, and I
think we probably should. I believe the way it works is that whenever
a function depends on the operating system's timestamp or locale
definitions, we decide it has to be stable, not immutable. We don't
expect those things to be updated very often, but we know sometimes
they do get updated.

Now apparently what we've done for time zones is we have both
json_path_exists and json_path_exists_tz, and the former only supports
things that are truly immutable while the latter additionally supports
things that depend on time zone, and are thus marked stable.

I suppose we could just add support for these locale-dependent
operations to the "tz" version and have them error out in the non-tz
version. After all, the effect of depending on time zone is, as far as
I know, the same as the effect of depending on locale: the function
can't be immutable any more. The only real problem with that idea, at
least to my knowledge, is that the function naming makes you think
that it's just about time zones and not about anything else. Maybe
that's a wart we can live with?

Tom writes earlier in the thread that:

# We have the kluge of having separate "_tz" functions to support
# non-immutable datetime operations, but that way doesn't seem like
# it's going to scale well to multiple sources of mutability.

But I'm not sure I understand why it matters that there are multiple
sources of mutability here. Maybe I'm missing a piece of the puzzle
here.

--
Robert Haas
EDB: http://www.enterprisedb.com





On 13 May 2025, at 2:07 PM, David E. Wheeler <david@justatheory.com> wrote:

On May 9, 2025, at 15:50, Robert Haas <robertmhaas@gmail.com> wrote:

# We have the kluge of having separate "_tz" functions to support
# non-immutable datetime operations, but that way doesn't seem like
# it's going to scale well to multiple sources of mutability.

But I'm not sure I understand why it matters that there are multiple
sources of mutability here. Maybe I'm missing a piece of the puzzle
here.

I read that to mean “we’re not going to add another json_path_exists_* function for every potentially immutable JSONPath function. But I take your point that it could be generalized for *any* mutable function. In which case maybe it should be renamed?

Best,

David

We discussed this a bit during the APFS:

As Robert said—and I agree—renaming the existing _tz family would be more trouble than it’s worth, given the need for deprecations, migration paths, etc. If we were designing this today, suffixes like _stable or _volatile might have been more appropriate, but at this point, we’re better off staying consistent with the _tz family.

So the path forward seems to be:

- Put these new functions under the jsonb_path_*_tz family.

- Raise an error if they’re used in the non-_tz versions.

- Document this behavior clearly.

I’ll make sure to follow the patterns in the existing _tz functions closely.

Other thoughts and head’s up are, of course, welcome.

Patch CF entry: https://commitfest.postgresql.org/patch/5270/

Last updated Sept 24, so it will also need a rebase to account for changes in jsonpath_scan.l. I’ll get to that shortly.

On Tue, May 13, 2025 at 11:00 PM David E. Wheeler <david@justatheory.com> wrote:
> On May 13, 2025, at 16:24, Florents Tselai <florents.tselai@gmail.com> wrote:
> > As Robert said—and I agree—renaming the existing _tz family would be more trouble than it’s worth, given the need
fordeprecations, migration paths, etc. If we were designing this today, suffixes like _stable or _volatile might have
beenmore appropriate, but at this point, we’re better off staying consistent with the _tz family. 
>
> I get the pragmatism, and don’t want to over-bike-shed, but what a wart to live with. [I just went back and re-read
Robert’spost, and didn’t realize he used exactly the same expression!] Would it really be too effortful to create
_stableor _volatile functions and leave the _tz functions as a sort of legacy? 

No, that wouldn't be too much work, but the issue is that people will
keep using the _tz versions and when we eventually try to remove them
those people will complain no matter how prominent we make the
deprecation notice. If we want to go this route, maybe we should do
something like:

1. Add the new versions with a _s suffix or whatever.

2. Invent a GUC jsonb_tz_warning = { on | off } that advises you to
use the new functions instead, whenever you use the old ones.

3. After N years, flip the default value from off to on.

4. After M additional years, remove the old functions and the GUC.

5. Still get complaints.

--
Robert Haas
EDB: http://www.enterprisedb.com



"David E. Wheeler" <david@justatheory.com> writes:
> On May 21, 2025, at 14:06, Robert Haas <robertmhaas@gmail.com> wrote:
>> If we want to go this route, maybe we should do
>> something like:
>> ...
>> 5. Still get complaints.

> Complainers gonna complain.

Yeah.  I do not see the point of that amount of effort.

> Any idea how widespread the use of the function is? It was added in 17, and I’ve met few who have really dug into the
jonpathstuff yet, let alone needed the time zone conversion functionality. 

That's a good point.  We should also remember that if somebody really
really doesn't want to fix their app, they can trivially create a
wrapper function with the old name.

Having said that, what's wrong with inventing some improved function
names and never removing the old ones?

            regards, tom lane



On Wed, May 21, 2025 at 2:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Having said that, what's wrong with inventing some improved function
> names and never removing the old ones?

I don't particularly like the clutter, but if the consensus is that
the clutter doesn't matter, fair enough.

--
Robert Haas
EDB: http://www.enterprisedb.com




> On 22 May 2025, at 5:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, May 21, 2025 at 2:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Having said that, what's wrong with inventing some improved function
>> names and never removing the old ones?
>
> I don't particularly like the clutter, but if the consensus is that
> the clutter doesn't matter, fair enough.
>

It depends really on how much future work we expect in adding more methods in jsonpath.
I think there’s a lot of potential there, but that’s a guess really.

On David’s point about popularity:
In my experience timestamp related stuff from jsonb documents end up in a generated column,
and are indexed & queried there.
I expect that to continue in PG18 onwards as we’ll have virtual gen columns too.

Just to be clear, though, adding another version of these functions means
we’ll have an additional (now third) set of the same 5 functions:

The vanilla versions are considered stable and the suffixed *_tz or *_volatile (?)

jsonb_path_exists
jsonb_path_query
jsonb_path_query_array
jsonb_path_query_first
jsonb_path_match









On 09.05.25 21:50, Robert Haas wrote:
> I always struggle a bit to remember our policy on these issues -- to
> the best of my knowledge, we haven't documented it anywhere, and I
> think we probably should. I believe the way it works is that whenever
> a function depends on the operating system's timestamp or locale
> definitions, we decide it has to be stable, not immutable. We don't
> expect those things to be updated very often, but we know sometimes
> they do get updated.

I don't understand how this discussion got to the conclusion that 
functions that depend on the locale cannot be immutable.  Note that the 
top-level functions lower, upper, and initcap themselves are immutable.




On Thu, May 22, 2025 at 4:56 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> I don't understand how this discussion got to the conclusion that
> functions that depend on the locale cannot be immutable.  Note that the
> top-level functions lower, upper, and initcap themselves are immutable.

Oh, well that was what Tom said last September and I just assumed he
was right about the policy. If not, well then that's different.

--
Robert Haas
EDB: http://www.enterprisedb.com




> On 22 May 2025, at 11:56 PM, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 09.05.25 21:50, Robert Haas wrote:
>> I always struggle a bit to remember our policy on these issues -- to
>> the best of my knowledge, we haven't documented it anywhere, and I
>> think we probably should. I believe the way it works is that whenever
>> a function depends on the operating system's timestamp or locale
>> definitions, we decide it has to be stable, not immutable. We don't
>> expect those things to be updated very often, but we know sometimes
>> they do get updated.
>
> I don't understand how this discussion got to the conclusion that functions that depend on the locale cannot be
immutable. Note that the top-level functions lower, upper, and initcap themselves are immutable. 

I assume you mean that they’re set at initdb time, so there’s no mutability concern?




Florents Tselai <florents.tselai@gmail.com> writes:
> On 22 May 2025, at 11:56 PM, Peter Eisentraut <peter@eisentraut.org> wrote:
>> I don't understand how this discussion got to the conclusion that functions that depend on the locale cannot be
immutable. Note that the top-level functions lower, upper, and initcap themselves are immutable. 

> I assume you mean that they’re set at initdb time, so there’s no mutability concern?

Yeah, I think Peter's right and I'm wrong.  Obviously this ties into
our philosophical debate about how immutable is immutable.  But as
long as the functions only depend on locale settings that are fixed
at database creation, I think it's okay to consider them immutable.

If you were, say, depending on LC_NUMERIC, it would clearly be unsafe
to consider that immutable, so I'm not quite sure if this is the end
of the discussion.  But for what's mentioned in the thread title,
I think we only care about LC_CTYPE.

            regards, tom lane



On May 23, 2025, at 13:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> I assume you mean that they’re set at initdb time, so there’s no mutability concern?
>
> Yeah, I think Peter's right and I'm wrong.  Obviously this ties into
> our philosophical debate about how immutable is immutable.  But as
> long as the functions only depend on locale settings that are fixed
> at database creation, I think it's okay to consider them immutable.
>
> If you were, say, depending on LC_NUMERIC, it would clearly be unsafe
> to consider that immutable, so I'm not quite sure if this is the end
> of the discussion.  But for what's mentioned in the thread title,
> I think we only care about LC_CTYPE.

Oh, so maybe all this is moot, and Florents can go ahead and add support for the functions to the non-_tz functions?

Should there be some sort of inventory of what functions can be used in what contexts?

D


Вложения


On 24 May 2025, at 7:08 PM, David E. Wheeler <david@justatheory.com> wrote:

On May 23, 2025, at 13:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I assume you mean that they’re set at initdb time, so there’s no mutability concern?

Yeah, I think Peter's right and I'm wrong.  Obviously this ties into
our philosophical debate about how immutable is immutable.  But as
long as the functions only depend on locale settings that are fixed
at database creation, I think it's okay to consider them immutable.

If you were, say, depending on LC_NUMERIC, it would clearly be unsafe
to consider that immutable, so I'm not quite sure if this is the end
of the discussion.  But for what's mentioned in the thread title,
I think we only care about LC_CTYPE.

Oh, so maybe all this is moot, and Florents can go ahead and add support for the functions to the non-_tz functions?


I think the patch is still in reasonably good shape and hasn’t changed much since September 24.
So yes, I’d hope there are still some valid points to consider or improve.
Otherwise, I’ll have only myself to blame for not pushing harder before the feature freeze. 😅
On May 24, 2025, at 12:51, Florents Tselai <florents.tselai@gmail.com> wrote:

> I think the patch is still in reasonably good shape and hasn’t changed much since September 24.So yes, I’d hope there
arestill some valid points to consider or improve. 

Okay, here’s a review.

Patch applies cleanly.

All tests pass.

I'm curious why you added the `arg0` and `arg1` fields to the `method_args` union. Is there some reason that the
existing`left` and `right` fields wouldn't work? Admittedly these are not formally binary operators, but I don't see
thatit matters much. 

The existing string() method operates on a "JSON boolean, number, string, or datetime"; should these functions also
operateon all those data types? 

The argument to the trim methods appears to be ignored:

```
postgres=# select jsonb_path_query('"zzzytest"', '$.ltrim("xyz")');
 jsonb_path_query
------------------
 "zzzytest"
```

I'm wondering if the issue is the use of the opt_datetime_template in the grammar?

```
    | '.' STR_LTRIM_P '(' opt_datetime_template ')'
        { $$ = makeItemUnary(jpiStrLtrimFunc, $4); }
    | '.' STR_RTRIM_P '(' opt_datetime_template ')'
        { $$ = makeItemUnary(jpiStrRtrimFunc, $4); }
    | '.' STR_BTRIM_P '(' opt_datetime_template ')'
        { $$ = makeItemUnary(jpiStrBtrimFunc, $4); }
```

I realize it resolves to a string, but for some reason it doesn't get picked up. But also, do you want to support
variablesfor either of these arguments? If so, maybe rename and use starts_with_initial: 

```
starts_with_initial:
    STRING_P                        { $$ = makeItemString(&$1); }
    | VARIABLE_P                    { $$ = makeItemVariable(&$1); }
    ;
```

split_part() does not support a negative n value:

```
postgres=# select split_part('abc,def,ghi,jkl', ',', -2) ;
 split_part
------------
 ghi

select jsonb_path_query('"abc,def,ghi,jkl"', '$.split_part(",", -2)');
ERROR:  syntax error at or near "-" of jsonpath input
LINE 1: select jsonb_path_query('"abc,def,ghi,jkl"', '$.split_part("...
```

Nor does a `+` work. I think you’d be better served using `csv_elem`, something like:


```
    | '.' STR_SPLIT_PART_P '(' STRING_P csv_elem ‘)’
```

I'm not sure how well these functions comply with the SQL spec. Does it have a provision for implementation-specific
methods?I *think* all existing methods are defined by the spec, but someone with access to its contents would have to
sayfor sure. And maybe we don't care, consider this a natural extension? 

I’ve attached a new patch with docs.

Best,

David





Вложения
On May 24, 2025, at 17:39, David E. Wheeler <david@justatheory.com> wrote:

> I’ve attached a new patch with docs.

Oh, and I forgot to mention, src/backend/utils/adt/jsonpath_scan.l looks like it has a ton of duplication in it.
Shouldn’tit just add the new keywords, something like: 

```
@@ -415,6 +415,11 @@ static const JsonPathKeyword keywords[] = {
    {4, false, WITH_P, "with"},
    {5, true, FALSE_P, "false"},
    {5, false, FLOOR_P, "floor"},
+   {5, false, STR_BTRIM_P, "btrim"},
+   {5, false, STR_LOWER_P, "lower"},
+   {5, false, STR_LTRIM_P, "ltrim"},
+   {5, false, STR_RTRIM_P, "rtrim"},
+   {5, false, STR_UPPER_P, "upper"},
    {6, false, BIGINT_P, "bigint"},
    {6, false, DOUBLE_P, "double"},
    {6, false, EXISTS_P, "exists"},
@@ -428,10 +433,13 @@ static const JsonPathKeyword keywords[] = {
    {7, false, INTEGER_P, "integer"},
    {7, false, TIME_TZ_P, "time_tz"},
    {7, false, UNKNOWN_P, "unknown"},
+   {7, false, STR_INITCAP_P, "initcap"},
+   {7, false, STR_REPLACEFUNC_P, "replace"},
    {8, false, DATETIME_P, "datetime"},
    {8, false, KEYVALUE_P, "keyvalue"},
    {9, false, TIMESTAMP_P, "timestamp"},
    {10, false, LIKE_REGEX_P, "like_regex"},
+   {10, false, STR_SPLIT_PART_P, "split_part"},
    {12, false, TIMESTAMP_TZ_P, "timestamp_tz"},
```

Best,

David


Вложения
On May 24, 2025, at 17:46, David E. Wheeler <david@justatheory.com> wrote:

> Oh, and I forgot to mention, src/backend/utils/adt/jsonpath_scan.l looks like it has a ton of duplication in it.
Shouldn’tit just add the new keywords, something like: 

And now I see my patch broke the grammar because I left some of my fiddling in there. Apologies. Here’s an updated
patchwith the updated keyword map, too. 

Best,

David


Вложения

> On 25 May 2025, at 1:01 AM, David E. Wheeler <david@justatheory.com> wrote:
>
> On May 24, 2025, at 17:55, David E. Wheeler <david@justatheory.com> wrote:
>
>> And now I see my patch broke the grammar because I left some of my fiddling in there. Apologies. Here’s an updated
patchwith the updated keyword map, too. 
>
> No, really :sigh:
>
> D
>
> <v5-0001-Add-additional-jsonpath-string-methods.patch>

The most important problem in jsonpath_scan.l now is the fact that I broke the alphabetical ordering of keywords in v2
,
and you followed that too.


> I'm curious why you added the `arg0` and `arg1` fields to the `method_args` union. Is there some reason that the
existing`left` and `right` fields wouldn't work?  

The left-right ended-up being more of a brain teaser to work with in jsonpath_exec.
Until before these methods, the opt_datetime_template was the only argument passed in existing jsonpath functions,
So initially I used that as a template to add to the scann-parser infra,
but then realized it may make morese sense to have a way to access indexed-args.
IIRC, with an eye in the future I found it much more convenient - less of the  to work with indexed-args.
I should have gone back and use them for *TRIM_P too
But you may be onto something with the split_part thing.

> The existing string() method operates on a "JSON boolean, number, string, or datetime"; should these functions also
operateon all those data types? 

You mean implicitely conversion to string first?
I don’t think so: I’d expect to work like ‘$…string().replace()…'

> I'm not sure how well these functions comply with the SQL spec.

The fact that Peter hasn’t raized this as an issue, makes me think it's not one




Hackers,

On May 26, 2025, at 18:00, David E. Wheeler <david@justatheory.com> wrote:

> Yes, I think it would be best if the grammar was a bit stricter --- and therefore more self-explanatory --- by making
theargs closer to what the functions actually expect. 

I chatted with Florents and went ahead and simplified the grammar and fixed the other issues I identified in my
originalreview. Note that there are two commits, now: 

`v6-0001-Rename-jsonpath-method-arg-tokens.patch` Renames some of the symbols in the jsonpath grammar so that they’re
lessgeneric (`csv*`) and more specific to their contents. This is with the expectation that they will be used by other
methodsin the next patch and in the future. I thought it best to separate this refactoring from the feature patch. 

`v6-0002-Add-additional-jsonpath-string-methods.patch` is that feature patch. The grammar now parses the exact number
andtypes of each method argument, eliminating the need for explicit error checking. It also uses the existing patterns
forhandling methods with two parameters, removing a bunch of duplicate code. 

Overall I think this is ready for committer review, although now that I’m not just reviewing but hacking on this thing,
maybesomeone else should review it first. 

Patches attached, GitHub PR here:

  https://github.com/theory/postgres/pull/12

Best,

David





Вложения
On Jun 3, 2025, at 15:10, David E. Wheeler <david@justatheory.com> wrote:

>> https://github.com/theory/postgres/pull/12
> 
> Found a little more unnecessary code to remove. Updated patches attached.

And these should fix the CI failure. I also ran pgindent.

Best,

David


Вложения
On Jun 4, 2025, at 11:27, David E. Wheeler <david@justatheory.com> wrote:

> And these should fix the CI failure. I also ran pgindent.

Here’s a quick rebase. I think it’s ready for committer review, but since I’ve poked at it quite a bit myself, I
updatedthe Commitfest item [1] to “Needs Review”. 

Best,

David

[1]: https://commitfest.postgresql.org/patch/5270/




Вложения

> On 14 Jun 2025, at 6:08 PM, David E. Wheeler <david@justatheory.com> wrote:
>
> On Jun 4, 2025, at 11:27, David E. Wheeler <david@justatheory.com> wrote:
>
>> And these should fix the CI failure. I also ran pgindent.
>
> Here’s a quick rebase. I think it’s ready for committer review, but since I’ve poked at it quite a bit myself, I
updatedthe Commitfest item [1] to “Needs Review”. 
>
> Best,
>
> David
>
> [1]: https://commitfest.postgresql.org/patch/5270/
>
> <v9-0001-Rename-jsonpath-method-arg-tokens.patch><v9-0002-Add-additional-jsonpath-string-methods.patch>
>

The basic problem I see with these latest revisions/refactorings is that they fail for pg_upgrade afaict.
Probably this means that some of the rearrangements on the parser/scanner are not that flexible.


On Jul 10, 2025, at 13:41, Florents Tselai <florents.tselai@gmail.com> wrote:

> The basic problem I see with these latest revisions/refactorings is that they fail for pg_upgrade afaict.
> Probably this means that some of the rearrangements on the parser/scanner are not that flexible.

Oh, is that what’s happening? What needs to happen to properly support pg_upgrade?

Best,

David


Вложения
On Jul 10, 2025, at 14:13, David E. Wheeler <david@justatheory.com> wrote:

> Oh, is that what’s happening? What needs to happen to properly support pg_upgrade?

Turns out there was an assertion failure that David Johnson spotted in the core dump of the test output and then in the
regresslog. Turns out I wasn’t using `--enable-assert` in my testing. With that I was able to replicate it and find the
coredump in the “Crash Reports” tab of the macOS Console.app with this line: 


{"imageOffset":8079100,"sourceLine":1265,"sourceFile":"jsonpath.c","symbol":"jspGetLeftArg","imageIndex":0,"symbolLocation":348},

When I switched to using jspGetLeftArg and jspGetRightArg in the last patch, I forgot to add the assertions you
originallyhad in your patch, Florents. Resolved in the attached, which now passes `make check-world` for me. 

Also available as a pull request[1].

Best,

David

[1] https://github.com/theory/postgres/pull/12/files




Вложения
On Jul 10, 2025, at 18:40, David E. Wheeler <david@justatheory.com> wrote:

> Resolved in the attached, which now passes `make check-world` for me.
>
> Also available as a pull request[1].

Now with the `ISO C90 forbids mixed declarations and code` warning cleared up.

Weird that there’s a failure on Bookworm with Meson [1] (pg_regress diffs [2]) but not Bookworm with Configure [3].
Collationissue, perhaps? 

Best,

David

[1]: https://cirrus-ci.com/task/5363472541679616
[2]:
https://api.cirrus-ci.com/v1/artifact/task/5363472541679616/testrun/build-32/testrun/regress/regress/regression.diffs
[3]: https://cirrus-ci.com/task/5926422495100928





Вложения
On Jul 10, 2025, at 19:23, David E. Wheeler <david@justatheory.com> wrote:

> Now with the `ISO C90 forbids mixed declarations and code` warning cleared up.
>
> Weird that there’s a failure on Bookworm with Meson [1] (pg_regress diffs [2]) but not Bookworm with Configure [3].
Collationissue, perhaps? 

David Johnson noticed that this build is 32-bit. I looked at the split_path function and after trying a couple of
things,realized that it was passing an int8 when the SQL function in Marlena.c passes an int4. This change got the test
passingin my clone (indentation reduced): 


```patch
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -2959,7 +2959,7 @@ executeStringInternalMethod(JsonPathExecContext *cxt, JsonPathItem *jsp,
     C_COLLATION_OID,
     CStringGetTextDatum(tmp),
     CStringGetTextDatum(from_str),
-    DirectFunctionCall1(numeric_int8, NumericGetDatum(n))));
+    DirectFunctionCall1(numeric_int4, NumericGetDatum(n))));
                 break;
             }
         default:
```

v12 attached.


Best,

David




Вложения

On Fri, Jul 11, 2025 at 9:48 PM David E. Wheeler <david@justatheory.com> wrote:
On Jul 10, 2025, at 19:23, David E. Wheeler <david@justatheory.com> wrote:

> Now with the `ISO C90 forbids mixed declarations and code` warning cleared up.
>
> Weird that there’s a failure on Bookworm with Meson [1] (pg_regress diffs [2]) but not Bookworm with Configure [3]. Collation issue, perhaps?

David Johnson noticed that this build is 32-bit. I looked at the split_path function and after trying a couple of things, realized that it was passing an int8 when the SQL function in Marlena.c passes an int4. This change got the test passing in my clone (indentation reduced):

Occasionally I've noticed myself some inconsistencies wrt to compiler warnings between meson & make . 
But cirrus seems generally happy now https://cirrus-ci.com/build/4964687915253760 

To recap so far; 

- I like your changes and renames on the parser/lexer; it indeed looks much cleaner now and will help with future improvements.
- I also like the addition of executeStringInternalMethod ; it'll help us add more stuff in the future (reminder that for the original patch I implemented the methods I'd like more, but string operations are quite more).

- AFAICT no test cases / results have changed with your versions; is this correct ?

On Jul 12, 2025, at 00:07, Florents Tselai <florents.tselai@gmail.com> wrote:

> To recap so far;
>
> - I like your changes and renames on the parser/lexer; it indeed looks much cleaner now and will help with future
improvements.

Thanks!

> - I also like the addition of executeStringInternalMethod ; it'll help us add more stuff in the future (reminder that
forthe original patch I implemented the methods I'd like more, but string operations are quite more). 

Agreed.

> - AFAICT no test cases / results have changed with your versions; is this correct ?

I made some minor changes, notably to test alternate trim values and a negative position passed to split_part():

```patch
--- a/src/test/regress/sql/jsonb_jsonpath.sql
+++ b/src/test/regress/sql/jsonb_jsonpath.sql
@@ -627,7 +627,7 @@ rollback;
 select jsonb_path_query('"   hello   "', '$.ltrim(" ")');
 select jsonb_path_query('"   hello   "', '$.ltrim(" ")');
 select jsonb_path_query('"   hello   "', '$.ltrim()');
-select jsonb_path_query('"   hello   "', '$.ltrim()');
+select jsonb_path_query('"zzzytest"', '$.ltrim("xyz")');
 select jsonb_path_query('null', '$.ltrim()');
 select jsonb_path_query('null', '$.ltrim()', silent => true);
 select jsonb_path_query('[]', '$.ltrim()');
@@ -647,13 +647,13 @@ select jsonb_path_query_array('["  maybe  ", "  yes", "  no"]', '$[*].ltrim().ty
   -- test .rtrim()
 select jsonb_path_query('"   hello   "', '$.rtrim(" ")');
-select jsonb_path_query('"   hello   "', '$.rtrim(" ")');
+select jsonb_path_query('"testxxzx"', '$.rtrim("xyz")');
 select jsonb_path_query('"   hello   "', '$.rtrim()');
 select jsonb_path_query('"   hello   "', '$.rtrim()');
   -- test .btrim()
 select jsonb_path_query('"   hello   "', '$.btrim(" ")');
-select jsonb_path_query('"   hello   "', '$.btrim(" ")');
+select jsonb_path_query('"xyxtrimyyx"', '$.btrim("xyz")');
 select jsonb_path_query('"   hello   "', '$.btrim()');
 select jsonb_path_query('"   hello   "', '$.btrim()');
  @@ -723,6 +723,7 @@ select jsonb_path_query('"hello world"', '$.replace("hello","bye") starts with "
   -- Test .split_part()
 select jsonb_path_query('"abc~@~def~@~ghi"', '$.split_part("~@~", 2)');
+select jsonb_path_query('"abc,def,ghi,jkl"', '$.split_part(",", -2)');
   -- Test string methods play nicely together
 select jsonb_path_query('"hello world"', '$.replace("hello","bye").upper()');
```

Best,

David


Вложения