Обсуждение: Re: SQL:2023 JSON simplified accessor support
On 29.08.24 18:33, Alexandra Wang wrote: > I’ve implemented the member and array accessors and attached two > alternative patches: > > 1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch > enables dot access to JSON object fields and subscript access to > indexed JSON array elements by converting "." and "[]" indirection > into a JSON_QUERY JsonFuncExpr node. > > 2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This > alternative patch implements dot access to JSON object fields by > transforming the "." indirection into a "->" operator. > > The upside of the v1 patch is that it strictly aligns with the SQL > standard, which specifies that the simplified access is equivalent to: > > JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON > EMPTY NULL ON ERROR) > > However, the performance of JSON_QUERY might be suboptimal due to > function call overhead. Therefore, I implemented the v2 alternative > using the "->" operator. Using the operator approach would also allow taking advantage of optimizations such as <https://www.postgresql.org/message-id/flat/CAKU4AWoqAVya6PBhn%2BBCbFaBMt3z-2%3Di5fKO3bW%3D6HPhbid2Dw%40mail.gmail.com>. > There is some uncertainty about the semantics of conditional array > wrappers. Currently, there is at least one subtle difference between > the "->" operator and JSON_QUERY, as shown: That JSON_QUERY bug has been fixed. I suggest you rebase both of your patches over this, just to double check everything. But then I think you can drop the v1 patch and just submit a new version of v2. The patch should eventually contain some documentation. It might be good starting to look for a good spot where to put that documentation. It might be either near the json types documentation or near the general qualified identifier syntax, not sure.
On 2024-09-26 Th 11:45 AM, Alexandra Wang wrote: > Hi, > > I didn’t run pgindent earlier, so here’s the updated version with the > correct indentation. Hope this helps! This is a really nice feature, and provides a lot of expressive power for such a small piece of code. I notice this doesn't seem to work for domains over json and jsonb. andrew@~=# create domain json_d as json; CREATE DOMAIN andrew@~=# create table test_json_dot(id int, test_json json_d); CREATE TABLE andrew@~=# insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json; INSERT 0 1 | | andrew@~=# select (test_json_dot.test_json).b, json_query(test_json, 'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from test_json_dot; ERROR: column notation .b applied to type json_d, which is not a composite type LINE 1: select (test_json_dot.test_json).b, json_query(test_json, 'l... I'm not sure that's a terribly important use case, but we should probably make it work. If it's a domain we should get the basetype of the domain. There's some example code in src/backend/utils/adt/jsonfuncs.c cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > I didn’t run pgindent earlier, so here’s the updated version with the > correct indentation. Hope this helps! Oh, nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to want thetext output far more often than a JSON scalar. Best, David
On 2024-09-27 Fr 5:49 AM, David E. Wheeler wrote: > On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > >> I didn’t run pgindent earlier, so here’s the updated version with the >> correct indentation. Hope this helps! > Oh, nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to wantthe text output far more often than a JSON scalar. > That would defeat being able to chain these. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Sep 27, 2024, at 12:07, Andrew Dunstan <andrew@dunslane.net> wrote: > That would defeat being able to chain these. Not if it’s a different operator. But I’m fine to just keep using ->> at the end of a chain. D
On 07.11.24 22:57, Alexandra Wang wrote: > The v5 patch includes the following updates: > > - Fixed the aforementioned issue and added more tests covering composite > types with domains, nested domains, and arrays of domains over > JSON/JSONB. > > - Refactored the logic for parsing JSON/JSONB object fields by moving it > from ParseFuncOrColumn() to transformIndirection() for improved > readability. The ParseFuncOrColumn() function is already handling both > single-argument function calls and composite types, and it has other > callers besides transformIndirection(). This patch implements array subscripting support for the json type, but it does it in a non-standard way, using ParseJsonSimplifiedAccessorArrayElement(). This would be better done by providing a typsubscript function for the json type. This is what jsonb already has, which is why your patch doesn't need to provide the array support for jsonb. I suggest you implement the typsubscript support for the json type (make it a separate patch but you can keep it in this thread IMO) and remove the custom code from this patch. A few comments on the tests: The tests look good to me. Good coverage of weirdly nested types. Results look correct. +drop table if exists test_json_dot; This can be omitted, since we know that the table doesn't exist yet. This code could be written in the more conventional insert ... values syntax: +insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json; +insert into test_json_dot select 1, '{"a": 2, "b": {"c": 42}}'::json; +insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::json; +insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, "d":[{"x": [11, 12]}, {"y": [21, 22]}]}'::json; Then the ::json casts can also go away. Also, using a different value for "id" for each row would be more useful, so that the subsequent tests could then be written like select id, (test_jsonb_dot.test_jsonb).b from test_jsonb_dot; so we can see which result corresponds to which input row. Also make id the primary key in this table. Also, let's keep the json and the jsonb variants aligned. There are some small differences, like the test_json_dot table having 4 rows but the test_jsonb_dot having 3 rows. And the array and wildcard tests in the opposite order. Not a big deal, but keeping these the same helps eyeballing the test files. Maybe add a comment somewhere in this file that you are running the json_query equivalents to cross-check the semantics of the dot syntax. Some documentation should be written. This looks like this right place to start: https://www.postgresql.org/docs/devel/sql-expressions.html#FIELD-SELECTION and them maybe some cross-linking between there and the sections on JSON types and operators.
Hi, On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote: > > Hi, hackers. > > I have implemented dot notation for jsonb using type subscripting back > in April 2023, but failed post it because I left Postgres Professional > company soon after and have not worked anywhere since, not even had > any interest in programming. > > But yesterday I accidentally decided to look what is going on at > commitfests and found this thread. I immediately started to rebase > code from PG16, fixed some bugs, and now I'm ready to present my > version of the patches which is much more complex. > > Unfortunately, I probably won't be able to devote that much time to > the patches as before. Thank you so much, Nikita, for revisiting this topic and sharing your v6 patches! Now that we have two solutions, I’d like to summarize our current options. In Postgres, there are currently three ways to access json/jsonb object fields and array elements: 1. '->' operator (Postgres-specific, predates SQL standard): postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd' -> 0; -- returns 1 2. jsonb subscripting (not available for the plain json type): postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['d'][0]; --returns 1 3. json_query() function: postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2, 3]}', 'lax $.d[0]'); --returns 1 A few weeks ago, I did the following performance benchmarking of the three approaches: -- setup: create table tbl(id int, col1 jsonb); insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2, {"j":"vj"}]}]}' from generate_series(1, 100000)i; -- jsonb_operator.sql SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl; -- jsonb_subscripting.sql SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl; -- jsonb_path_query.sql SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl; # pgbench on my local MacOS machine, using -O3 optimization: pgbench -n -f XXX.sql postgres -T100 Results (Latency | tps): "->" operator: 14ms | 68 jsonb subscripting: 17ms | 58 jsonb_path_query() function: 23ms | 43 So performance from best to worst: "->" operator > jsonb subscripting >> jsonb_path_query() function. I’m excited to see your implementation of dot notation for jsonb using type subscripting! This approach rounds out the three possible ways to implement JSON simplified accessors: ## v1: json_query() implementation Pros: - Fully adheres to the SQL standard. According to the SQL standard, if the JSON simplified accessor <JA> is not a JSON item method, it is equivalent to a <JSON query>: JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) (I’m skipping <JA> that includes a JSON item method, as it is currently outside the scope of both sets of patches.) - Easiest to implement Cons: - Slow due to function call overhead. ## v2-v5: "->" operator implementation We initially chose this approach for its performance benefits. However, while addressing Peter’s feedback on v5, I encountered the following issue: -- setup create table test_json_dot(id serial primary key, test_json json); insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2, "b": {"c": 42}}]'); -- problematic query: test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from test_json_dot; id | b | expected ----+---+----------------- 5 | | [42, {"c": 42}] (1 row) This issue arises from the semantic differences between the "->" operator and json_query’s "lax" mode. One possible workaround is to redefine the "->" operator and modify its implementation. However, since the "->" operator has been in use for a long time, such changes would break backward compatibility. ## v6: jsonb subscription implementation Nikita's patches pass all my functional test cases, including those that failed with the previous approach. Supported formats: - JSON member accessor - JSON wildcard member accessor (Not available in v5, so this is also a plus) - JSON array accessor Questions: 1. Since Nikita’s patches did not address the JSON data type, and JSON currently does not support subscripting, should we limit the initial feature set to JSONB dot-notation for now? In other words, if we aim to fully support JSON simplified accessors for the plain JSON type, should we handle support for plain JSON subscripting as a follow-up effort? 2. I have yet to have a more thorough review of Nikita’s patches. One area I am not familiar with is the hstore-related changes. How relevant is hstore to the JSON simplified accessor? Best, Alex
On 2024-11-21 Th 3:52 PM, Alexandra Wang wrote: > Hi, > > On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote: >> Hi, hackers. >> >> I have implemented dot notation for jsonb using type subscripting back >> in April 2023, but failed post it because I left Postgres Professional >> company soon after and have not worked anywhere since, not even had >> any interest in programming. >> >> But yesterday I accidentally decided to look what is going on at >> commitfests and found this thread. I immediately started to rebase >> code from PG16, fixed some bugs, and now I'm ready to present my >> version of the patches which is much more complex. >> >> Unfortunately, I probably won't be able to devote that much time to >> the patches as before. > Thank you so much, Nikita, for revisiting this topic and sharing your > v6 patches! > > Now that we have two solutions, I’d like to summarize our current > options. > > In Postgres, there are currently three ways to access json/jsonb > object fields and array elements: > > 1. '->' operator (Postgres-specific, predates SQL standard): > > postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd' > -> 0; -- returns 1 > > 2. jsonb subscripting (not available for the plain json type): > > postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, > 3]}'::jsonb)['d'][0]; --returns 1 > > 3. json_query() function: > > postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2, > 3]}', 'lax $.d[0]'); --returns 1 > > A few weeks ago, I did the following performance benchmarking of the > three approaches: > > -- setup: > create table tbl(id int, col1 jsonb); > insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2, > {"j":"vj"}]}]}' from generate_series(1, 100000)i; > > -- jsonb_operator.sql > SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl; > > -- jsonb_subscripting.sql > SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl; > > -- jsonb_path_query.sql > SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl; > > # pgbench on my local MacOS machine, using -O3 optimization: > pgbench -n -f XXX.sql postgres -T100 > > Results (Latency | tps): > > "->" operator: 14ms | 68 > jsonb subscripting: 17ms | 58 > jsonb_path_query() function: 23ms | 43 > > So performance from best to worst: > "->" operator > jsonb subscripting >> jsonb_path_query() function. > > I’m excited to see your implementation of dot notation for jsonb using > type subscripting! This approach rounds out the three possible ways to > implement JSON simplified accessors: > > ## v1: json_query() implementation > > Pros: > - Fully adheres to the SQL standard. > > According to the SQL standard, if the JSON simplified accessor <JA> is > not a JSON item method, it is equivalent to a <JSON query>: > > JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON > EMPTY NULL ON ERROR) > > (I’m skipping <JA> that includes a JSON item method, as it is > currently outside the scope of both sets of patches.) > > - Easiest to implement > > Cons: > - Slow due to function call overhead. > > ## v2-v5: "->" operator implementation > > We initially chose this approach for its performance benefits. > However, while addressing Peter’s feedback on v5, I encountered the > following issue: > > -- setup > create table test_json_dot(id serial primary key, test_json json); > insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2, > "b": {"c": 42}}]'); > > -- problematic query: > test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH > CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from > test_json_dot; > id | b | expected > ----+---+----------------- > 5 | | [42, {"c": 42}] > (1 row) > > This issue arises from the semantic differences between the "->" > operator and json_query’s "lax" mode. One possible workaround is to > redefine the "->" operator and modify its implementation. However, since > the "->" operator has been in use for a long time, such changes would > break backward compatibility. > > ## v6: jsonb subscription implementation > > Nikita's patches pass all my functional test cases, including those > that failed with the previous approach. > > Supported formats: > - JSON member accessor > - JSON wildcard member accessor (Not available in v5, so this is also a plus) > - JSON array accessor > > Questions: > > 1. Since Nikita’s patches did not address the JSON data type, and JSON > currently does not support subscripting, should we limit the initial > feature set to JSONB dot-notation for now? In other words, if we aim > to fully support JSON simplified accessors for the plain JSON type, > should we handle support for plain JSON subscripting as a follow-up > effort? > > 2. I have yet to have a more thorough review of Nikita’s patches. > One area I am not familiar with is the hstore-related changes. How > relevant is hstore to the JSON simplified accessor? > We can't change the way the "->" operator works, as there could well be uses of it in the field that rely on its current behaviour. But maybe we could invent a new operator which is compliant with the standard semantics for dot access, and call that. Then we'd get the best performance, and also we might be able to implement it for the plain JSON type. If that proves not possible we can think about not implementing for plain JSON, but I'd rather not go there until we have to. I don't think we should be including hstore changes here - we should just be aiming at implementing the standard for JSON access. hstore changes if any should be a separate feature. The aren't relevant to JSON access, although they might use some of the same infrastructure, depending on implementation. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 21.11.24 23:46, Andrew Dunstan wrote: >> Questions: >> >> 1. Since Nikita’s patches did not address the JSON data type, and JSON >> currently does not support subscripting, should we limit the initial >> feature set to JSONB dot-notation for now? In other words, if we aim >> to fully support JSON simplified accessors for the plain JSON type, >> should we handle support for plain JSON subscripting as a follow-up >> effort? >> >> 2. I have yet to have a more thorough review of Nikita’s patches. >> One area I am not familiar with is the hstore-related changes. How >> relevant is hstore to the JSON simplified accessor? >> > > We can't change the way the "->" operator works, as there could well be > uses of it in the field that rely on its current behaviour. But maybe we > could invent a new operator which is compliant with the standard > semantics for dot access, and call that. Then we'd get the best > performance, and also we might be able to implement it for the plain > JSON type. If that proves not possible we can think about not > implementing for plain JSON, but I'd rather not go there until we have to. Yes, I think writing a custom operator that is similar to "->" but has the required semantics is the best way forward. (Maybe it can be just a function?) > I don't think we should be including hstore changes here - we should > just be aiming at implementing the standard for JSON access. hstore > changes if any should be a separate feature. The aren't relevant to JSON > access, although they might use some of the same infrastructure, > depending on implementation. In a future version, the operator/function mentioned above could be a catalogued property of a type, similar to typsubscript. Then you could also apply this to other types. But let's leave that for later. If I understand it correctly, Nikita's patch uses the typsubscript support function to handle both bracket subscripting and dot notation. I'm not sure if it's right to mix these two together. Maybe I didn't understand that correctly.
Hello hackers,
I’ve fixed the compilation failure for hstore and updated the patches.
In this version, I’ve further cleaned up the code and added more
comments. I hope this helps!
Summary of changes:
v8-0001 through v8-0005:
Refactoring and preparatory steps for the actual implementation.
v8-0006 (Implement read-only dot notation for JSONB):
I removed the vars field (introduced in v7) from JsonbSubWorkspace
after realizing that JsonPathVariable is not actually needed for
dot-notation.
v8-0007 (Allow wildcard member access for JSONB):
I'm aware that the #if 0 in check_indirection() is not ideal. I
haven’t removed it yet because I’m still reviewing other cases—beyond
our JSONB simplified accessor use case—where this check should remain
strict. I’ll post an additional patch to address this.
Looking forward to comments and feedback!
Thanks,
Alex
I’ve fixed the compilation failure for hstore and updated the patches.
In this version, I’ve further cleaned up the code and added more
comments. I hope this helps!
Summary of changes:
v8-0001 through v8-0005:
Refactoring and preparatory steps for the actual implementation.
v8-0006 (Implement read-only dot notation for JSONB):
I removed the vars field (introduced in v7) from JsonbSubWorkspace
after realizing that JsonPathVariable is not actually needed for
dot-notation.
v8-0007 (Allow wildcard member access for JSONB):
I'm aware that the #if 0 in check_indirection() is not ideal. I
haven’t removed it yet because I’m still reviewing other cases—beyond
our JSONB simplified accessor use case—where this check should remain
strict. I’ll post an additional patch to address this.
Looking forward to comments and feedback!
Thanks,
Alex
Вложения
- v8-0002-Pass-field-accessors-to-generic-subscripting.patch
- v8-0004-Extract-coerce_jsonpath_subscript.patch
- v8-0001-Allow-transformation-only-of-a-sublist-of-subscri.patch
- v8-0005-Eanble-String-node-as-field-accessors-in-generic-.patch
- v8-0006-Implement-read-only-dot-notation-for-jsonb.patch
- v8-0003-Export-jsonPathFromParseResult.patch
- v8-0007-Allow-wild-card-member-access-for-jsonb.patch
Hello hackers,
On Thu, Feb 27, 2025 at 9:46 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
Summary of changes:
v8-0001 through v8-0005:
Refactoring and preparatory steps for the actual implementation.
v8-0006 (Implement read-only dot notation for JSONB):
I removed the vars field (introduced in v7) from JsonbSubWorkspace
after realizing that JsonPathVariable is not actually needed for
dot-notation.
v8-0007 (Allow wildcard member access for JSONB):
I'm aware that the #if 0 in check_indirection() is not ideal. I
haven’t removed it yet because I’m still reviewing other cases—beyond
our JSONB simplified accessor use case—where this check should remain
strict. I’ll post an additional patch to address this.
I made the following minor changes in v9:
- More detailed commit messages
- Additional tests
- Use "?column?" as the column name for trailing .*.
Other than that, the patches remain the same as the previous
version:
0001 - 0005: preparation steps for the actual implementation and do
not change or add new behavior.
0006: jsonb dot notation and sliced subscripting
0007: jsonb wildcard member access
version:
0001 - 0005: preparation steps for the actual implementation and do
not change or add new behavior.
0006: jsonb dot notation and sliced subscripting
0007: jsonb wildcard member access
Thanks,
Alex
Вложения
- v9-0003-Export-jsonPathFromParseResult.patch
- v9-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-Not.patch
- v9-0004-Extract-coerce_jsonpath_subscript.patch
- v9-0001-Allow-transformation-of-only-a-sublist-of-subscri.patch
- v9-0006-Implement-read-only-dot-notation-for-jsonb.patch
- v9-0007-Allow-wild-card-member-access-for-jsonb.patch
- v9-0005-Eanble-String-node-as-field-accessors-in-generic-.patch
Hi Alex, The code comments and the commit messages help a lot when reviewing! Thanks for the new version. The code LGTM and check-world is happy. I've also performed some tests and everything looks good! Just some minor points about this new version: ## v9-0005 Typo on commit message title ## v9-0006 > + * The following functions create various types of JsonPathParseItem nodes, > + * which are used to build JsonPath expressions for jsonb simplified accessor. > Just to avoid misinterpretation I think that we can replace "The following functions" with "The make_jsonpath_item_* functions" since we can have more functions in the future that are not fully related with these. Does that make sense? -- Matheus Alcantara
On Mon, Mar 3, 2025 at 4:16 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > > Hi Alex, > > The code comments and the commit messages help a lot when reviewing! Thanks for > the new version. > > The code LGTM and check-world is happy. I've also performed some tests and > everything looks good! > > Just some minor points about this new version: > > ## v9-0005 > > Typo on commit message title > > ## v9-0006 > > > + * The following functions create various types of JsonPathParseItem nodes, > > + * which are used to build JsonPath expressions for jsonb simplified accessor. > > > Just to avoid misinterpretation I think that we can replace "The following > functions" with "The make_jsonpath_item_* functions" since we can have more > functions in the future that are not fully related with these. Does that make > sense? > Sorry, I've forgotten to include a question. Do you have anything in mind about documentation changes for this patch? -- Matheus Alcantara
Hi Matheus,
On Mon, Mar 3, 2025 at 1:43 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote:
On Mon, Mar 3, 2025 at 4:16 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> Hi Alex,
>
> The code comments and the commit messages help a lot when reviewing! Thanks for
> the new version.
>
> The code LGTM and check-world is happy. I've also performed some tests and
> everything looks good!
> Just some minor points about this new version:
>
> ## v9-0005
>
> Typo on commit message title
> ## v9-0006
>
> > + * The following functions create various types of JsonPathParseItem nodes,
> > + * which are used to build JsonPath expressions for jsonb simplified accessor.
> >
> Just to avoid misinterpretation I think that we can replace "The following
> functions" with "The make_jsonpath_item_* functions" since we can have more
> functions in the future that are not fully related with these. Does that make
> sense?
Thank you so much for reviewing! I've attached v10, which addresses your feedback.
On Mon, Mar 3, 2025 at 1:43 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote:
Sorry, I've forgotten to include a question. Do you have anything in mind about
documentation changes for this patch?
For the documentation, I’m thinking of adding it under JSON Types [1].
I’d either add a new “Simple Dot-Notation” section after jsonb
subscripting [2] or replace it. Let me know what you think.
I’d either add a new “Simple Dot-Notation” section after jsonb
subscripting [2] or replace it. Let me know what you think.
Best,
Alex
Вложения
- v10-0004-Extract-coerce_jsonpath_subscript.patch
- v10-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
- v10-0003-Export-jsonPathFromParseResult.patch
- v10-0005-Enable-String-node-as-field-accessors-in-generic.patch
- v10-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v10-0006-Implement-read-only-dot-notation-for-jsonb.patch
- v10-0007-Allow-wild-card-member-access-for-jsonb.patch
On Mon, Mar 3, 2025 at 12:23 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
I've attached v10, which addresses your feedback.
Hi Alex! Thanks for the patches.
In src/test/regress/sql/jsonb.sql, the section marked with "-- slices are not supported" should be relabeled. That comment predates these patches, and is now misleading.
A bit further down in expected/jsonb.out, there is an expected failure, but no SQL comment to indicate that it is expected:
+SELECT (t.jb).* FROM test_jsonb_dot_notation;
+ERROR: missing FROM-clause entry for table "t"
+LINE 1: SELECT (t.jb).* FROM test_jsonb_dot_notation;
+ERROR: missing FROM-clause entry for table "t"
+LINE 1: SELECT (t.jb).* FROM test_jsonb_dot_notation;
Perhaps a "-- fails" comment would clarify? Then, further down,
+SELECT (jb).a.**.x FROM test_jsonb_dot_notation; -- not supported
+ERROR: syntax error at or near "**"
+LINE 1: SELECT (jb).a.**.x FROM test_jsonb_dot_notation;
+ERROR: syntax error at or near "**"
+LINE 1: SELECT (jb).a.**.x FROM test_jsonb_dot_notation;
I wonder if it would be better to have the parser handle this case and raise a ERRCODE_FEATURE_NOT_SUPPORTED instead?
I got curious about the support for this new dot notation in the plpgsql parser and tried:
+DO $$
+DECLARE
+ a jsonb := '[1,2,3,4,5,6,7]'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := a[2:];
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: [1, 2, 3, 4, 5, 6, 7]
+NOTICE: [3, 4, 5, 6, 7]
+NOTICE: [5, 6, 7]
+NOTICE: 7
+DECLARE
+ a jsonb := '[1,2,3,4,5,6,7]'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := a[2:];
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: [1, 2, 3, 4, 5, 6, 7]
+NOTICE: [3, 4, 5, 6, 7]
+NOTICE: [5, 6, 7]
+NOTICE: 7
which looks good! But then I tried:
+DO $$
+DECLARE
+ a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := COALESCE(a."NU", a[2]);
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}
+ERROR: missing FROM-clause entry for table "a"
+LINE 1: a := COALESCE(a."NU", a[2])
+ ^
+QUERY: a := COALESCE(a."NU", a[2])
+CONTEXT: PL/pgSQL function inline_code_block line 8 at assignment
+DECLARE
+ a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := COALESCE(a."NU", a[2]);
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}
+ERROR: missing FROM-clause entry for table "a"
+LINE 1: a := COALESCE(a."NU", a[2])
+ ^
+QUERY: a := COALESCE(a."NU", a[2])
+CONTEXT: PL/pgSQL function inline_code_block line 8 at assignment
which suggests the plpgsql parser does not recognize a."NU" as we'd expect. Any thoughts on this?
I notice there are no changes in src/interfaces/ecpg/test, which concerns me. The sqljson.pgc and sqljson_jsontable.pgc files are already testing json handling in ecpg; perhaps just extend those a bit?
On Tue, Mar 4, 2025 at 6:05 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
But then I tried:+DO $$
+DECLARE
+ a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := COALESCE(a."NU", a[2]);
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}
+ERROR: missing FROM-clause entry for table "a"
+LINE 1: a := COALESCE(a."NU", a[2])
+ ^
+QUERY: a := COALESCE(a."NU", a[2])
+CONTEXT: PL/pgSQL function inline_code_block line 8 at assignmentwhich suggests the plpgsql parser does not recognize a."NU" as we'd expect. Any thoughts on this?
I should mention that
+DO $$
+DECLARE+ a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := COALESCE((a)."NU", (a)[2]);
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}
+NOTICE: [{"": [[3]]}, [6], [2], "bCi"]
+NOTICE: [2]
works fine. I guess that is good enough. Should we add these to the sql/jsonb.sql to document the expected behavior, both with the error when using plain "a" and with the correct output when using "(a)"? The reason I mention this is that the plpgsql parser might get changed at some point, and without a test case, we might not notice if this breaks.
On 2025-03-04 Tu 10:34 AM, Mark Dilger wrote:
On Tue, Mar 4, 2025 at 6:05 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:But then I tried:+DO $$
+DECLARE
+ a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := COALESCE(a."NU", a[2]);
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}
+ERROR: missing FROM-clause entry for table "a"
+LINE 1: a := COALESCE(a."NU", a[2])
+ ^
+QUERY: a := COALESCE(a."NU", a[2])
+CONTEXT: PL/pgSQL function inline_code_block line 8 at assignmentwhich suggests the plpgsql parser does not recognize a."NU" as we'd expect. Any thoughts on this?I should mention that+DO $$+DECLARE
+ a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := COALESCE((a)."NU", (a)[2]);
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}
+NOTICE: [{"": [[3]]}, [6], [2], "bCi"]
+NOTICE: [2]works fine. I guess that is good enough. Should we add these to the sql/jsonb.sql to document the expected behavior, both with the error when using plain "a" and with the correct output when using "(a)"? The reason I mention this is that the plpgsql parser might get changed at some point, and without a test case, we might not notice if this breaks.
Yes, I think so.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi Mark,
Thank you so much for reviewing! I have attached the new patches.
On Tue, Mar 4, 2025 at 8:05 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
On Mon, Mar 3, 2025 at 12:23 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:I've attached v10, which addresses your feedback.Hi Alex! Thanks for the patches.In src/test/regress/sql/jsonb.sql, the section marked with "-- slices are not supported" should be relabeled. That comment predates these patches, and is now misleading.A bit further down in expected/jsonb.out, there is an expected failure, but no SQL comment to indicate that it is expected:+SELECT (t.jb).* FROM test_jsonb_dot_notation;
+ERROR: missing FROM-clause entry for table "t"
+LINE 1: SELECT (t.jb).* FROM test_jsonb_dot_notation;Perhaps a "-- fails" comment would clarify? Then, further down,
Fixed.
+SELECT (jb).a.**.x FROM test_jsonb_dot_notation; -- not supported
+ERROR: syntax error at or near "**"
+LINE 1: SELECT (jb).a.**.x FROM test_jsonb_dot_notation;I wonder if it would be better to have the parser handle this case and raise a ERRCODE_FEATURE_NOT_SUPPORTED instead?
In 0008 I added a new token named "DOUBLE_ASTERISK" to the lexers to
represent "**". Hope this helps!
I got curious about the support for this new dot notation in the plpgsql parser and tried:+DO $$
+DECLARE
+ a jsonb := '[1,2,3,4,5,6,7]'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := a[2:];
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: [1, 2, 3, 4, 5, 6, 7]
+NOTICE: [3, 4, 5, 6, 7]
+NOTICE: [5, 6, 7]
+NOTICE: 7which looks good! But then I tried:+DO $$
+DECLARE
+ a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}'::jsonb;
+BEGIN
+ WHILE a IS NOT NULL
+ LOOP
+ RAISE NOTICE '%', a;
+ a := COALESCE(a."NU", a[2]);
+ END LOOP;
+END
+$$ LANGUAGE plpgsql;
+NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}
+ERROR: missing FROM-clause entry for table "a"
+LINE 1: a := COALESCE(a."NU", a[2])
+ ^
+QUERY: a := COALESCE(a."NU", a[2])
+CONTEXT: PL/pgSQL function inline_code_block line 8 at assignmentwhich suggests the plpgsql parser does not recognize a."NU" as we'd expect. Any thoughts on this?
Thanks for the tests! I added them to the "jsonb" regress test.
I notice there are no changes in src/interfaces/ecpg/test, which concerns me. The sqljson.pgc and sqljson_jsontable.pgc files are already testing json handling in ecpg; perhaps just extend those a bit?
Thanks for bringing this up! I have added new tests in src/interfaces/ecpg/test/sql/sqljson.pgc.
Best,
Alex
Вложения
- v11-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
- v11-0003-Export-jsonPathFromParseResult.patch
- v11-0004-Extract-coerce_jsonpath_subscript.patch
- v11-0005-Enable-String-node-as-field-accessors-in-generic.patch
- v11-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v11-0007-Allow-wild-card-member-access-for-jsonb.patch
- v11-0006-Implement-read-only-dot-notation-for-jsonb.patch
- v11-0008-Add-as-a-new-token-in-scanners.patch
This patch set has expanded significantly in scope recently, which is probably the right thing, but that means there won't be enough time to review and finish it for PG18. So I'm moving this to the next commitfest now. On 13.03.25 15:02, Alexandra Wang wrote: > Hi Mark, > > Thank you so much for reviewing! I have attached the new patches. > > On Tue, Mar 4, 2025 at 8:05 AM Mark Dilger <mark.dilger@enterprisedb.com > <mailto:mark.dilger@enterprisedb.com>> wrote: > > > On Mon, Mar 3, 2025 at 12:23 PM Alexandra Wang > <alexandra.wang.oss@gmail.com <mailto:alexandra.wang.oss@gmail.com>> > wrote: > > I've attached v10, which addresses your feedback. > > > Hi Alex! Thanks for the patches. > > In src/test/regress/sql/jsonb.sql, the section marked with "-- > slices are not supported" should be relabeled. That comment > predates these patches, and is now misleading. > > A bit further down in expected/jsonb.out, there is an expected > failure, but no SQL comment to indicate that it is expected: > > +SELECT (t.jb).* FROM test_jsonb_dot_notation; > +ERROR: missing FROM-clause entry for table "t" > +LINE 1: SELECT (t.jb).* FROM test_jsonb_dot_notation; > > Perhaps a "-- fails" comment would clarify? Then, further down, > > > Fixed. > > +SELECT (jb).a.**.x FROM test_jsonb_dot_notation; -- not supported > +ERROR: syntax error at or near "**" > +LINE 1: SELECT (jb).a.**.x FROM test_jsonb_dot_notation; > > I wonder if it would be better to have the parser handle this case > and raise a ERRCODE_FEATURE_NOT_SUPPORTED instead? > > > In 0008 I added a new token named "DOUBLE_ASTERISK" to the lexers to > represent "**". Hope this helps! > > I got curious about the support for this new dot notation in the > plpgsql parser and tried: > > +DO $$ > +DECLARE > + a jsonb := '[1,2,3,4,5,6,7]'::jsonb; > +BEGIN > + WHILE a IS NOT NULL > + LOOP > + RAISE NOTICE '%', a; > + a := a[2:]; > + END LOOP; > +END > +$$ LANGUAGE plpgsql; > +NOTICE: [1, 2, 3, 4, 5, 6, 7] > +NOTICE: [3, 4, 5, 6, 7] > +NOTICE: [5, 6, 7] > +NOTICE: 7 > > which looks good! But then I tried: > > +DO $$ > +DECLARE > + a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": > [-6, -8]}'::jsonb; > +BEGIN > + WHILE a IS NOT NULL > + LOOP > + RAISE NOTICE '%', a; > + a := COALESCE(a."NU", a[2]); > + END LOOP; > +END > +$$ LANGUAGE plpgsql; > +NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]} > +ERROR: missing FROM-clause entry for table "a" > +LINE 1: a := COALESCE(a."NU", a[2]) > + ^ > +QUERY: a := COALESCE(a."NU", a[2]) > +CONTEXT: PL/pgSQL function inline_code_block line 8 at assignment > > which suggests the plpgsql parser does not recognize a."NU" as we'd > expect. Any thoughts on this? > > > Thanks for the tests! I added them to the "jsonb" regress test. > > I notice there are no changes in src/interfaces/ecpg/test, which > concerns me. The sqljson.pgc and sqljson_jsontable.pgc files are > already testing json handling in ecpg; perhaps just extend those a bit? > > Thanks for bringing this up! I have added new tests in src/interfaces/ > ecpg/test/sql/sqljson.pgc. > > > Best, > Alex
On 13/03/2025 15:02, Alexandra Wang wrote: > Hi Mark, > > Thank you so much for reviewing! I have attached the new patches. > Hi Alex, I am reviewing this from a feature perspective and not from a code perspective. On the whole, this looks good to me from a standards point of view. There are two things that stand out to me about this feature: 1) I am not seeing the ** syntax in the standard, so does it need to be signaled as not supported? Perhaps I am just overlooking something. 2) There is no support for <JSON item method>. Since this appears to be constructing a jsonpath query, could that not be added to the syntax and allow jsonpath to throw the error if the function doesn't exist? -- Vik Fearing
Hi Alex!
Glad you made so much effort to develop this patch set!
I think this is an important part of Json functionality.
I've looked into you patch and noticed change in behavior
in new test results:
postgres@postgres=# create table t(x int, y jsonb);
insert into t select 1, '{"a": 1, "b": 42}'::jsonb;
insert into t select 1, '{"a": 2, "b": {"c": 42}}'::jsonb;
insert into t select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::jsonb;
CREATE TABLE
Time: 6.373 ms
INSERT 0 1
Time: 3.299 ms
INSERT 0 1
Time: 2.532 ms
INSERT 0 1
Time: 2.453 ms
insert into t select 1, '{"a": 1, "b": 42}'::jsonb;
insert into t select 1, '{"a": 2, "b": {"c": 42}}'::jsonb;
insert into t select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::jsonb;
CREATE TABLE
Time: 6.373 ms
INSERT 0 1
Time: 3.299 ms
INSERT 0 1
Time: 2.532 ms
INSERT 0 1
Time: 2.453 ms
Original master:
postgres@postgres=# select (t.y).b.c.d.e from t;
ERROR: column notation .b applied to type jsonb, which is not a composite type
LINE 1: select (t.y).b.c.d.e from t;
^
Time: 0.553 ms
ERROR: column notation .b applied to type jsonb, which is not a composite type
LINE 1: select (t.y).b.c.d.e from t;
^
Time: 0.553 ms
Patched (with v11):
postgres@postgres=# select (t.y).b.c.d.e from t;
e
---
(3 rows)
e
---
(3 rows)
Is this correct?
--
Hi Vik,
On Thu, Mar 27, 2025 at 3:27 PM Vik Fearing <vik@postgresfriends.org> wrote:
I am reviewing this from a feature perspective and not from a code
perspective. On the whole, this looks good to me from a standards point
of view.
Thank you so much for reviewing!
On Thu, Mar 27, 2025 at 3:27 PM Vik Fearing <vik@postgresfriends.org> wrote:
There are two things that stand out to me about this feature:
1) I am not seeing the ** syntax in the standard, so does it need to be
signaled as not supported? Perhaps I am just overlooking something.
I think you’re referring to patch v11-0008, which adds the ** token in
the lexer. You’re right — the ** syntax is not mentioned at all in the
SQL standard. I added lexer support for ** based on earlier feedback
from Mark Dilger. See his comment below:
the lexer. You’re right — the ** syntax is not mentioned at all in the
SQL standard. I added lexer support for ** based on earlier feedback
from Mark Dilger. See his comment below:
On Thu, Mar 13, 2025 at 3:02 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
On Tue, Mar 4, 2025 at 8:05 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:+SELECT (jb).a.**.x FROM test_jsonb_dot_notation; -- not supported
+ERROR: syntax error at or near "**"
+LINE 1: SELECT (jb).a.**.x FROM test_jsonb_dot_notation;I wonder if it would be better to have the parser handle this case and raise a ERRCODE_FEATURE_NOT_SUPPORTED instead?In 0008 I added a new token named "DOUBLE_ASTERISK" to the lexers torepresent "**". Hope this helps!
The ** tests are there because ** is valid in jsonpath, and can be
used in functions like json_query() and jsonb_path_query(). So I think
we should make it explicit to users that ** is not supported by dot
access.
As long as the tests make that clear, I don’t have a strong preference
whether we throw a syntax error in the lexer or a “feature not
supported” error in the parser. If we prefer a syntax error, I’m happy
to drop patch 0008.
used in functions like json_query() and jsonb_path_query(). So I think
we should make it explicit to users that ** is not supported by dot
access.
As long as the tests make that clear, I don’t have a strong preference
whether we throw a syntax error in the lexer or a “feature not
supported” error in the parser. If we prefer a syntax error, I’m happy
to drop patch 0008.
On Thu, Mar 27, 2025 at 3:27 PM Vik Fearing <vik@postgresfriends.org> wrote:
2) There is no support for <JSON item method>. Since this appears to be
constructing a jsonpath query, could that not be added to the syntax and
allow jsonpath to throw the error if the function doesn't exist?
You’re right — there’s currently no support for <JSON item method>. I
intentionally didn’t include it in this patch because it would again
expand the scope of work.
Supporting <JSON item method> seems non-trivial to me, because the
current design transforms expression nodes separately for each
indirection. I previously tried a patch that simply converted the full
chain of indirections into a JSON_QUERY() call in
transformIndirection(). The problem with that approach is that in
EXPLAIN output or view definitions, we’d see the rewritten
JSON_QUERY() instead of the original dot notation — which feels like a
leaky abstraction.
To support item methods, I think we’d need to add a new kind of
indirection to represent function calls. Then, we could either: a)
explicitly add each item method token in gram.y, similar to the list
of method keywords in jsonpath_gram.y; or b) allow a generic
method-call token in gram.y, and transform it into the corresponding
JsonPathItemType that jsonpath understands.
Either way, I’m happy to work on it, but would prefer to discuss and
implement it in a separate follow-up patch.
intentionally didn’t include it in this patch because it would again
expand the scope of work.
Supporting <JSON item method> seems non-trivial to me, because the
current design transforms expression nodes separately for each
indirection. I previously tried a patch that simply converted the full
chain of indirections into a JSON_QUERY() call in
transformIndirection(). The problem with that approach is that in
EXPLAIN output or view definitions, we’d see the rewritten
JSON_QUERY() instead of the original dot notation — which feels like a
leaky abstraction.
To support item methods, I think we’d need to add a new kind of
indirection to represent function calls. Then, we could either: a)
explicitly add each item method token in gram.y, similar to the list
of method keywords in jsonpath_gram.y; or b) allow a generic
method-call token in gram.y, and transform it into the corresponding
JsonPathItemType that jsonpath understands.
Either way, I’m happy to work on it, but would prefer to discuss and
implement it in a separate follow-up patch.
P.S. After reading your second question, your first one makes more
sense to me — I’ve already taken the easy route of not parsing item
methods and was planning to leave that work for a follow-up patch. So
maybe it’s more consistent to just let the syntax error for ** happen,
rather than explicitly throwing a “feature not supported” error?
The only difference, I think, is that we don’t want ** in simplified
accessor syntax, whereas we do want to support item methods in the
future.
Let me know what you think!
sense to me — I’ve already taken the easy route of not parsing item
methods and was planning to leave that work for a follow-up patch. So
maybe it’s more consistent to just let the syntax error for ** happen,
rather than explicitly throwing a “feature not supported” error?
The only difference, I think, is that we don’t want ** in simplified
accessor syntax, whereas we do want to support item methods in the
future.
Let me know what you think!
Best,
Alex
Hi Nikita,
Thank you so much for reviewing!
On Wed, Apr 23, 2025 at 6:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi Alex!Glad you made so much effort to develop this patch set!I think this is an important part of Json functionality.I've looked into you patch and noticed change in behaviorin new test results:postgres@postgres=# create table t(x int, y jsonb);
insert into t select 1, '{"a": 1, "b": 42}'::jsonb;
insert into t select 1, '{"a": 2, "b": {"c": 42}}'::jsonb;
insert into t select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::jsonb;
CREATE TABLE
Time: 6.373 ms
INSERT 0 1
Time: 3.299 ms
INSERT 0 1
Time: 2.532 ms
INSERT 0 1
Time: 2.453 msOriginal master:postgres@postgres=# select (t.y).b.c.d.e from t;
ERROR: column notation .b applied to type jsonb, which is not a composite type
LINE 1: select (t.y).b.c.d.e from t;
^
Time: 0.553 msPatched (with v11):postgres@postgres=# select (t.y).b.c.d.e from t;
e
---
(3 rows)Is this correct?
This is correct.
With this patch, the query should return 3 empty rows. We expect
dot notation to behave the same as the json_query() below in lax mode
with NULL ON EMPTY.
dot notation to behave the same as the json_query() below in lax mode
with NULL ON EMPTY.
postgres=# select json_query(y, 'lax $.b.c.d.e' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from t;
json_query
------------
(3 rows)
json_query
------------
(3 rows)
Best,
Alex
hi. I have applied for 0001 to 0006. static void jsonb_subscript_transform(SubscriptingRef *sbsref, List **indirection, ParseState *pstate, bool isSlice, bool isAssignment) { List *upperIndexpr = NIL; ListCell *idx; sbsref->refrestype = JSONBOID; sbsref->reftypmod = -1; if (jsonb_check_jsonpath_needed(*indirection)) { sbsref->refjsonbpath = jsonb_subscript_make_jsonpath(pstate, indirection, &sbsref->refupperindexpr, &sbsref->reflowerindexpr); return; } foreach(idx, *indirection) { Node *i = lfirst(idx); A_Indices *ai; Node *subExpr; Assert(IsA(i, A_Indices)); ai = castNode(A_Indices, i); if (isSlice) { Node *expr = ai->uidx ? ai->uidx : ai->lidx; ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("jsonb subscript does not support slices"), parser_errposition(pstate, exprLocation(expr)))); } I am confused by the above error handling: errmsg("jsonb subscript does not support slices"). we can do select (jsonb '[1,2,3]')[0:1]; or SELECT (jb).a[2:3].b FROM test_jsonb_dot_notation; this is by definition, "slices"? Anyway, I doubt this error handling will ever be reachable. jsonb_check_jsonpath_needed checks whether the indirection contains is_slice, but jsonb_subscript_transform already takes isSlice as an argument. Maybe we can refactor it somehow. T_String is a primitive node type with no subnodes. typedef struct String { pg_node_attr(special_read_write) NodeTag type; char *sval; } String; then in src/backend/nodes/nodeFuncs.c: if (expr && !IsA(expr, String) && WALK(expr)) return true; we can change it to if (WALK(expr)) return true; but in function expression_tree_walker_impl we have to change it as case T_MergeSupportFunc: case T_String: /* primitive node types with no expression subnodes */ break;
hi. in src/backend/catalog/sql_features.txt should we mark any T860, T861, T862, T863, T864 items as YES? typedef struct SubscriptingRef { /* expressions that evaluate to upper container indexes */ List *refupperindexpr; } SubscriptingRef.refupperindexpr meaning changed, So the above comments also need to be changed? v11-0004-Extract-coerce_jsonpath_subscript.patch + /* + * We known from can_coerce_type that coercion will succeed, so + * coerce_type could be used. Note the implicit coercion context, which is + * required to handle subscripts of different types, similar to overloaded + * functions. + */ + subExpr = coerce_type(pstate, + subExpr, subExprType, + targetType, -1, + COERCION_IMPLICIT, + COERCE_IMPLICIT_CAST, + -1); + if (subExpr == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("jsonb subscript must have text type"), the targetType can be "integer", then the error message errmsg("jsonb subscript must have text type") would be wrong? Also this error handling is not necessary. since we can_coerce_type already tell us that coercion will succeed. Also, do we really put v11-0004 as a separate patch? in gram.y we have: if (!IsA($5, A_Const) || castNode(A_Const, $5)->val.node.type != T_String) ereport(ERROR, errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("only string constants are supported in JSON_TABLE path specification"), so simply, in make_jsonpath_item_expr we can expr = transformExpr(pstate, expr, pstate->p_expr_kind); if (!IsA(expr, Const) || ((Const *) expr)->consttype != INT4OID) ereport(ERROR, errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("only integer constants are supported in jsonb simplified accessor subscripting"), parser_errposition(pstate, exprLocation(expr))); because I think the current error message "got type: unknown" is not good. select ('123'::jsonb).a['1']; ERROR: jsonb simplified accessor supports subscripting in type: INT4, got type: unknown then we don't need two "ereport(ERROR" within make_jsonpath_item_expr we can also Assert (cnst->constisnull) is false. see gram.y:16976 I saw you introduce the word "AST", for example "Converts jsonpath AST into jsonpath value in binary." I am not sure that is fine. in jsonb_subscript_make_jsonpath we have: + foreach(lc, *indirection) + { + + if (IsA(accessor, String)) + .... + else if (IsA(accessor, A_Star)) + .... + else if (IsA(accessor, A_Indices)) + .... + else + break; Is the last else branch unreliable? since indirection only support for String, A_Star, A_Indices, we already have Assert in jsonb_check_jsonpath_needed to ensure that. + *indirection = list_delete_first_n(*indirection, pathlen); also this is not necessary, because pathlen will be the same length as list *indirection in current design. Please check the attached minor refactoring based on v11-0001 to v11-0006
Вложения
hi. in gram.y we have: indirection_el: '.' attr_name { $$ = (Node *) makeString($2); } we can be sure that dot notation, following dot is a plain string. then in jsonb_subscript_transform, we can transform the String Node to a TEXTOID Const. with that, most of the v11-0005-Enable-String-node-as-field-accessors-in-generic.patch would be unnecessary. Also in v11-0006-Implement-read-only-dot-notation-for-jsonb.patch all these with pattern ``if (IsA(expr, String)`` can be removed. in transformContainerSubscripts we have: sbsref = makeNode(SubscriptingRef); sbsref->refcontainertype = containerType; sbsref->refelemtype = elementType; sbsref->reftypmod = containerTypMod; sbsref->refexpr = (Expr *) containerBase; sbsref->refassgnexpr = NULL; /* caller will fill if it's an assignment */ sbsroutines->transform(sbsref, indirection, pstate, isSlice, isAssignment); then jsonb_subscript_transform we have sbsref->refjsonbpath = jsonb_subscript_make_jsonpath(pstate, indirection, &sbsref->refupperindexpr, &sbsref->reflowerindexpr); of course sbsref->refupperindexpr, sbsref->reflowerindexpr is NIL since we first called jsonb_subscript_make_jsonpath. so we can simplify the function signature as static void jsonb_subscript_make_jsonpath(pstate, indirection, sbsref) Within jsonb_subscript_make_jsonpath we are going to populate refupperindexpr, reflowerindexpr, refjsonbpath. The attached patch addresses both of these issues, along with additional related refactoring. It consolidates all the changes I think are appropriate, based on patches v1-0001 to v1-0006. This will include patches previously I sent in the earlier thread.
Вложения
On Thu, 13 Mar 2025 at 15:02, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > I have attached the new patches. Okay I finally found the time to look at this. I created a draft PR for pg_duckdb[1] to see if I would run into issues. There was only one real issue I found by doing this. The .* syntax is encoded as NULL in refupperindexpr, but that is currently already a valid representation of a slice operation without specifying upper or lower. The easiest way to reproduce the problem is: CREATE TABLE arr(a int[]); explain (verbose) SELECT a[:] FROM arr; QUERY PLAN ─────────────────────────────────────────────────────────────── Seq Scan on public.arr (cost=0.00..23.60 rows=1360 width=32) Output: a.* That last line should instead be Output: a[:] So I think we need to add a new Star node type, that represents the * literal after parsing. I haven't looked too closely at the code yet.
Hi Jian,
On Tue, Jun 24, 2025 at 3:30 PM jian he <jian.universality@gmail.com> wrote:
hi.
I have applied for 0001 to 0006.
Thank you so much for the very detailed and thorough review, and for
the patches!
the patches!
I've attached v12 that addresses your feedback:
v12-0001 to v12-0004 are refactoring patches that prepare for the
implementation.
implementation.
v12-0005 - implements jsonb dot-notation (e.g., (jb).a)
Changes from v11-0006 include:
a. subscripting with slicing and wildcard-related code has been
removed and will be added in the following commit
removed and will be added in the following commit
b. introduced a dedicated FieldAccessorExpr node to transform Strings
in dot-notation.
in dot-notation.
c. mixed syntax (such as (jb).a['b'].c is now allowed, with warnings.
d. miscellaneous bug fixes and code cleanup based on Jian’s feedback.
v12-0006 - implements jsonb subscripting with slicing.
This was split out from v11-0006 into a separate commit.
v12-0007 - implements wildcard member accessor.
I added the Star node that Jelte suggested for transforming the
wildcard member accessor.
wildcard member accessor.
Please find more detailed replies below:
On Tue, Jun 24, 2025 at 3:30 PM jian he <jian.universality@gmail.com> wrote:
static void
jsonb_subscript_transform(SubscriptingRef *sbsref,
List **indirection,
ParseState *pstate,
bool isSlice,
bool isAssignment)
{
List *upperIndexpr = NIL;
ListCell *idx;
sbsref->refrestype = JSONBOID;
sbsref->reftypmod = -1;
if (jsonb_check_jsonpath_needed(*indirection))
{
sbsref->refjsonbpath =
jsonb_subscript_make_jsonpath(pstate, indirection,
&sbsref->refupperindexpr,
&sbsref->reflowerindexpr);
return;
}
foreach(idx, *indirection)
{
Node *i = lfirst(idx);
A_Indices *ai;
Node *subExpr;
Assert(IsA(i, A_Indices));
ai = castNode(A_Indices, i);
if (isSlice)
{
Node *expr = ai->uidx ? ai->uidx : ai->lidx;
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("jsonb subscript does not support slices"),
parser_errposition(pstate, exprLocation(expr))));
}
I am confused by the above error handling:
errmsg("jsonb subscript does not support slices").
we can do
select (jsonb '[1,2,3]')[0:1];
or
SELECT (jb).a[2:3].b FROM test_jsonb_dot_notation;
this is by definition, "slices"?
Anyway, I doubt this error handling will ever be reachable.
jsonb_check_jsonpath_needed checks whether the indirection contains is_slice,
but jsonb_subscript_transform already takes isSlice as an argument.
Maybe we can refactor it somehow.
Good catch! I believe this error handling was originally copied
because the existing JSONB subscripting in Postgres doesn’t support
slices.
In v12, I’ve separated the slicing implementation into its own commit,
distinct from the one that implements dot-notation. Hope that helps!
distinct from the one that implements dot-notation. Hope that helps!
On Tue, Jun 24, 2025 at 3:30 PM jian he <jian.universality@gmail.com> wrote:
T_String is a primitive node type with no subnodes.
typedef struct String
{
pg_node_attr(special_read_write)
NodeTag type;
char *sval;
} String;
then in src/backend/nodes/nodeFuncs.c:
if (expr && !IsA(expr, String) && WALK(expr))
return true;
we can change it to
if (WALK(expr))
return true;
but in function expression_tree_walker_impl
we have to change it as
case T_MergeSupportFunc:
case T_String:
/* primitive node types with no expression subnodes */
break;
You’re right, the changes to the walker-related code weren’t very
clean. In v12, I introduced a new node, FieldAccessorExpr, to serve as
the expression node for dot-notation. This helped minimize changes to
the walker code and, in my opinion, results in a cleaner design than
using a more general Const(text) node.
clean. In v12, I introduced a new node, FieldAccessorExpr, to serve as
the expression node for dot-notation. This helped minimize changes to
the walker code and, in my opinion, results in a cleaner design than
using a more general Const(text) node.
On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote:
in src/backend/catalog/sql_features.txt
should we mark any T860, T861, T862, T863, T864
items as YES?
I’ve updated T860 and T861. I’m not entirely sure about the rest so
left them unchanged.
left them unchanged.
On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote:
typedef struct SubscriptingRef
{
/* expressions that evaluate to upper container indexes */
List *refupperindexpr;
}
SubscriptingRef.refupperindexpr meaning changed,
So the above comments also need to be changed?
Thanks. Done.
On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote:
v11-0004-Extract-coerce_jsonpath_subscript.patch
+ /*
+ * We known from can_coerce_type that coercion will succeed, so
+ * coerce_type could be used. Note the implicit coercion context, which is
+ * required to handle subscripts of different types, similar to overloaded
+ * functions.
+ */
+ subExpr = coerce_type(pstate,
+ subExpr, subExprType,
+ targetType, -1,
+ COERCION_IMPLICIT,
+ COERCE_IMPLICIT_CAST,
+ -1);
+ if (subExpr == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("jsonb subscript must have text type"),
the targetType can be "integer", then the error message
errmsg("jsonb subscript must have text type") would be wrong?
Also this error handling is not necessary.
since we can_coerce_type already tell us that coercion will succeed.
Also, do we really put v11-0004 as a separate patch?
Thanks for pointing this out! I’ve removed the unnecessary error
handling as you suggested. I kept v12-0004 as a separate patch to make
the review easier, but I’ll leave it up to the committer to decide
whether to squash it with the other commits.
handling as you suggested. I kept v12-0004 as a separate patch to make
the review easier, but I’ll leave it up to the committer to decide
whether to squash it with the other commits.
On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote:
in gram.y we have:
if (!IsA($5, A_Const) ||
castNode(A_Const, $5)->val.node.type != T_String)
ereport(ERROR,
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only string constants are
supported in JSON_TABLE path specification"),
so simply, in make_jsonpath_item_expr we can
expr = transformExpr(pstate, expr, pstate->p_expr_kind);
if (!IsA(expr, Const) || ((Const *) expr)->consttype != INT4OID)
ereport(ERROR,
errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("only integer constants are supported in jsonb
simplified accessor subscripting"),
parser_errposition(pstate, exprLocation(expr)));
because I think the current error message "got type: unknown" is not good.
select ('123'::jsonb).a['1'];
ERROR: jsonb simplified accessor supports subscripting in type: INT4,
got type: unknown
then we don't need two "ereport(ERROR" within make_jsonpath_item_expr
we can also Assert (cnst->constisnull) is false.
see gram.y:16976
Thanks for pointing this out! In v12, I added support for mixed
subscripting syntax, and updated the ERROR (or WARNING) messages as
follows:
subscripting syntax, and updated the ERROR (or WARNING) messages as
follows:
postgres=# SELECT (jb)['a'].b FROM test_jsonb_dot_notation; -- returns an array due to lax mode
b
------------
["c", "d"]
(1 row)
postgres=# SELECT (jb).a['b'] FROM test_jsonb_dot_notation; -- returns NULL due to strict mode with warnings
WARNING: 01000: mixed usage of jsonb simplified accessor syntax and jsonb subscripting.
LINE 1: SELECT (jb).a['b'] FROM test_jsonb_dot_notation;
^
HINT: use dot-notation for member access, or use non-null integer constants subscripting for array access.
LOCATION: jsonb_subscript_make_jsonpath, jsonbsubs.c:366
a
---
(1 row)
postgres=# select ('{"a": 1}'::jsonb)['a':'b']; -- fails
ERROR: 42804: only non-null integer constants are supported for jsonb simplified accessor subscripting
LINE 1: select ('{"a": 1}'::jsonb)['a':'b'];
^
HINT: use int data type for subscripting with slicing.
LOCATION: make_jsonpath_item_expr, jsonbsubs.c:218
b
------------
["c", "d"]
(1 row)
postgres=# SELECT (jb).a['b'] FROM test_jsonb_dot_notation; -- returns NULL due to strict mode with warnings
WARNING: 01000: mixed usage of jsonb simplified accessor syntax and jsonb subscripting.
LINE 1: SELECT (jb).a['b'] FROM test_jsonb_dot_notation;
^
HINT: use dot-notation for member access, or use non-null integer constants subscripting for array access.
LOCATION: jsonb_subscript_make_jsonpath, jsonbsubs.c:366
a
---
(1 row)
postgres=# select ('{"a": 1}'::jsonb)['a':'b']; -- fails
ERROR: 42804: only non-null integer constants are supported for jsonb simplified accessor subscripting
LINE 1: select ('{"a": 1}'::jsonb)['a':'b'];
^
HINT: use int data type for subscripting with slicing.
LOCATION: make_jsonpath_item_expr, jsonbsubs.c:218
So some of the ERRORs now disappear or downgrade to WARNINGs.
On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote:
I saw you introduce the word "AST", for example
"Converts jsonpath AST into jsonpath value in binary."
I am not sure that is fine.
Fixed.
On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote:
in jsonb_subscript_make_jsonpath we have:
+ foreach(lc, *indirection)
+ {
+
+ if (IsA(accessor, String))
+ ....
+ else if (IsA(accessor, A_Star))
+ ....
+ else if (IsA(accessor, A_Indices))
+ ....
+ else
+ break;
Is the last else branch unreliable? since indirection only support for
String, A_Star, A_Indices, we already have Assert in jsonb_check_jsonpath_needed
to ensure that.
+ *indirection = list_delete_first_n(*indirection, pathlen);
also this is not necessary,
because pathlen will be the same length as list *indirection in current design.
I kept this logic in v12, because in order to support the mixed usage
of dot-notation and existing jsonb subscripting (e.g., (jb).a['b'].c),
we need to switch between making jsonpath and transforming the upper
indexes for evaluation. So now, *indirection =
list_delete_first_n(*indirection, pathlen); is necessary, and pathlen
can differ.
of dot-notation and existing jsonb subscripting (e.g., (jb).a['b'].c),
we need to switch between making jsonpath and transforming the upper
indexes for evaluation. So now, *indirection =
list_delete_first_n(*indirection, pathlen); is necessary, and pathlen
can differ.
The else break; can be removed, but I choose to keep it for now and
added comments to clarify the behavior.
added comments to clarify the behavior.
On Wed, Jun 25, 2025 at 12:41 AM jian he <jian.universality@gmail.com> wrote:
On Wed, Jun 25, 2025 at 1:56 PM jian he <jian.universality@gmail.com> wrote:
>
> hi.
CREATE TABLE test_jsonb_dot_notation AS
SELECT '{"a": [1, 2, {"b": "c"}, {"b": "d", "e": "f", "x": {"y":
"yyy", "z": "zzz"}}], "b": [3, 4, {"b": "g", "x": {"y": "YYY", "z":
"ZZZ"}}]}'::jsonb jb;
CREATE VIEW v1 AS SELECT (jb).a[3].x.y FROM test_jsonb_dot_notation;
CREATE VIEW v2 AS SELECT (jb).a[3:].x.y[:-1] FROM test_jsonb_dot_notation;
CREATE VIEW v3 AS SELECT (jb).a[:3].x.y[-1:] FROM test_jsonb_dot_notation;
\sv v2
\sv v3
will trigger segment fault.
Great catch, thanks! Fixed and added your tests.
On Wed, Jun 25, 2025 at 12:41 AM jian he <jian.universality@gmail.com> wrote:
also do we need ban subscript slicing, when upper bound is less than
lower bound,
for example:
SELECT (jb).a[3:].x.y[0:'-1'::integer] AS y FROM test_jsonb_dot_notation;
I don't think we need to ban this case, since the SQL standard doesn't
ban it, and the (empty) result seems reasonable.
ban it, and the (empty) result seems reasonable.
On Thu, Jun 26, 2025 at 7:10 PM jian he <jian.universality@gmail.com> wrote:
in gram.y we have:
indirection_el:
'.' attr_name
{
$$ = (Node *) makeString($2);
}
we can be sure that dot notation, following dot is a plain string.
then in jsonb_subscript_transform, we can transform the String Node to
a TEXTOID Const.
with that, most of the
v11-0005-Enable-String-node-as-field-accessors-in-generic.patch
would be unnecessary.
Also in v11-0006-Implement-read-only-dot-notation-for-jsonb.patch
all these with pattern
``if (IsA(expr, String)``
can be removed.
Thanks for giving so much thought into this, I really appreciate it!
As I mentioned earlier, instead of using the Const(text) node, I added
a dedicated FieldAccessorExpr node for dot-notation. I think this
addresses the same concern.
As I mentioned earlier, instead of using the Const(text) node, I added
a dedicated FieldAccessorExpr node for dot-notation. I think this
addresses the same concern.
On Thu, Jun 26, 2025 at 7:10 PM jian he <jian.universality@gmail.com> wrote:
in transformContainerSubscripts we have:
sbsref = makeNode(SubscriptingRef);
sbsref->refcontainertype = containerType;
sbsref->refelemtype = elementType;
sbsref->reftypmod = containerTypMod;
sbsref->refexpr = (Expr *) containerBase;
sbsref->refassgnexpr = NULL; /* caller will fill if it's an assignment */
sbsroutines->transform(sbsref, indirection, pstate,
isSlice, isAssignment);
then jsonb_subscript_transform we have
sbsref->refjsonbpath =
jsonb_subscript_make_jsonpath(pstate, indirection,
&sbsref->refupperindexpr,
&sbsref->reflowerindexpr);
of course sbsref->refupperindexpr, sbsref->reflowerindexpr is NIL
since we first called jsonb_subscript_make_jsonpath.
so we can simplify the function signature as
static void jsonb_subscript_make_jsonpath(pstate, indirection, sbsref)
Within jsonb_subscript_make_jsonpath we are going to populate
refupperindexpr, reflowerindexpr, refjsonbpath.
You are right. I applied your suggestion.
On Thu, Jun 26, 2025 at 7:10 PM jian he <jian.universality@gmail.com> wrote:
The attached patch addresses both of these issues, along with additional related
refactoring. It consolidates all the changes I think are appropriate, based on
patches v1-0001 to v1-0006. This will include patches previously I sent in
the earlier thread.
Thanks again for the patch! It was really helpful! I didn't directly
apply it as I made a few different choices, but I think I have
addressed all the points you covered in it.
apply it as I made a few different choices, but I think I have
addressed all the points you covered in it.
Let me know your thoughts!
Best,
Alex
Вложения
- v12-0006-Implement-Jsonb-subscripting-with-slicing.patch
- v12-0004-Extract-coerce_jsonpath_subscript.patch
- v12-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v12-0007-Implement-jsonb-wildcard-member-accessor.patch
- v12-0005-Implement-read-only-dot-notation-for-jsonb.patch
- v12-0003-Export-jsonPathFromParseResult.patch
- v12-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
Hi Jelte,
On Sat, Jun 28, 2025 at 4:52 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
On Thu, 13 Mar 2025 at 15:02, Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
> I have attached the new patches.
Okay I finally found the time to look at this. I created a draft PR
for pg_duckdb[1] to see if I would run into issues. There was only one
real issue I found by doing this. The .* syntax is encoded as NULL in
refupperindexpr, but that is currently already a valid representation
of a slice operation without specifying upper or lower. The easiest
way to reproduce the problem is:
CREATE TABLE arr(a int[]);
explain (verbose) SELECT a[:] FROM arr;
QUERY PLAN
───────────────────────────────────────────────────────────────
Seq Scan on public.arr (cost=0.00..23.60 rows=1360 width=32)
Output: a.*
That last line should instead be
Output: a[:]
So I think we need to add a new Star node type, that represents the *
literal after parsing.
I haven't looked too closely at the code yet.
Thank you so much for testing and reviewing, I really appreciate it! I
fixed the EXPLAIN in v12 by adding a Star node, as you suggested.
Looking forward to your thoughts when you have time for a further
review!
Best,
Alex
On Wed, Jul 9, 2025 at 4:02 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > Thanks again for the patch! It was really helpful! I didn't directly > apply it as I made a few different choices, but I think I have > addressed all the points you covered in it. > > Let me know your thoughts! > hi. in v12-0001 and v12-0002. in transformIndirection if (!newresult) { /* * generic subscripting failed; falling back to function call or * field selection for a composite type. */ Node *n; /* try to find function for field selection */ newresult = ParseFuncOrColumn(pstate, list_make1(n), list_make1(result), last_srf, NULL, false, location); } the above comments mentioning "function call" is wrong? you passed NULL for (FuncCall *fn) in ParseFuncOrColumn. and ParseFuncOrColumn comments says ```If fn is null, we're dealing with column syntax not function syntax.`` I think coerce_jsonpath_subscript can be further simplified. we already have message like: errhint("jsonb subscript must be coercible to either integer or text."), no need to pass the third argument a constant (INT4OID). also ``Oid targetType = UNKNOWNOID;`` set it as InvalidOid would be better. attached is a minor refactoring of coerce_jsonpath_subscript based on (v12-0001 to v12-0004). after applied v12-0001 to v12-0006 + /* emit warning conditionally to minimize duplicate warnings */ + if (list_length(*indirection) > 0) + ereport(WARNING, + errcode(ERRCODE_WARNING), + errmsg("mixed usage of jsonb simplified accessor syntax and jsonb subscripting."), + errhint("use dot-notation for member access, or use non-null integer constants subscripting for array access."), + parser_errposition(pstate, warning_location)); src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]; WARNING: mixed usage of jsonb simplified accessor syntax and jsonb subscripting. LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... ^ HINT: use dot-notation for member access, or use non-null integer constants subscripting for array access. ERROR: subscript type bigint is not supported LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... ^ HINT: jsonb subscript must be coercible to either integer or text. The above example looks very bad. location printed twice, hint message is different. two messages level (ERROR, WARNING). also "or use non-null integer constants subscripting for array access." seems wrong? as you can see the below hint message saying it could be text or integer. select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8]; ERROR: subscript type bigint is not supported LINE 1: ...ect ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8]... ^ HINT: jsonb subscript must be coercible to either integer or text. also select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)[NULL::int4]; return NULL, so "use non-null integer constants" is wrong.
Вложения
On Thu, Jul 10, 2025 at 4:53 PM jian he <jian.universality@gmail.com> wrote: > > src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]; > WARNING: mixed usage of jsonb simplified accessor syntax and jsonb > subscripting. > LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... > ^ > HINT: use dot-notation for member access, or use non-null integer > constants subscripting for array access. > ERROR: subscript type bigint is not supported > LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... > ^ > HINT: jsonb subscript must be coercible to either integer or text. > > The above example looks very bad. location printed twice, hint message > is different. > two messages level (ERROR, WARNING). > For plainSELECT statement, we have WARNING only in src/test/regress/expected/xml.out, src/test/regress/expected/xml_2.out for example: SELECT xpath('/*', '<relativens xmlns=''relative''/>'); WARNING: line 1: xmlns: URI relative is not absolute <relativens xmlns='relative'/> ^ xpath -------------------------------------- {"<relativens xmlns=\"relative\"/>"} (1 row) so i am not sure a plain SELECT statement issuing WARNING is appropriate. ------------------------------------------ in jsonb_subscript_make_jsonpath we have foreach(lc, *indirection) { if (IsA(accessor, String)) .... else if (IsA(accessor, A_Indices)) else /* * Unsupported node type for creating jsonpath. Instead of * throwing an ERROR, break here so that we create a jsonpath from * as many indirection elements as we can and let * transformIndirection() fallback to alternative logic to handle * the remaining indirection elements. */ break; } the above ELSE branch comments look suspicious to me. transformIndirection->transformContainerSubscripts->jsonb_subscript_transform->jsonb_subscript_make_jsonpath As you can see, transformIndirection have a long distance from jsonb_subscript_make_jsonpath, let transformIndirection handle remaining indirection elements seems not good. if you look at src/backend/parser/gram.y line 16990. transformIndirection(ParseState *pstate, A_Indirection *ind) ind->indirection can be be Node of String, A_Indices, A_Star also the above ELSE branch never reached in regress tests. ------------------------------------------ typedef struct FieldAccessorExpr { Expr xpr; char *fieldname; /* name of the JSONB object field accessed via * dot notation */ Oid faecollid pg_node_attr(query_jumble_ignore); int location; } FieldAccessorExpr; first field as NodeTag should be just fine? I am not sure the field "location" is needed now, if it is needed, it should be type as ParseLoc. we should add it to src/tools/pgindent/typedefs.list
On Thu, Jul 10, 2025 at 9:34 PM jian he <jian.universality@gmail.com> wrote: > > ------------------------------------------ > in jsonb_subscript_make_jsonpath we have > foreach(lc, *indirection) > { > if (IsA(accessor, String)) > .... > else if (IsA(accessor, A_Indices)) > else > /* > * Unsupported node type for creating jsonpath. Instead of > * throwing an ERROR, break here so that we create a jsonpath from > * as many indirection elements as we can and let > * transformIndirection() fallback to alternative logic to handle > * the remaining indirection elements. > */ > break; > } > the above ELSE branch comments look suspicious to me. > transformIndirection->transformContainerSubscripts->jsonb_subscript_transform->jsonb_subscript_make_jsonpath > As you can see, transformIndirection have a long distance from > jsonb_subscript_make_jsonpath, > let transformIndirection handle remaining indirection elements seems not good. > context v12-0001 to v12-0006. this ELSE branch comments is wrong, because + if (jsonb_check_jsonpath_needed(*indirection)) + { + jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); + if (sbsref->refjsonbpath) + return; + } in jsonb_check_jsonpath_needed we already use Assert to confirm that list "indirection" is either String or A_Indices Node. in transformContainerSubscripts we have sbsroutines->transform(sbsref, indirection, pstate, isSlice, isAssignment); /* * Error out, if datatype failed to consume any indirection elements. */ if (list_length(*indirection) == indirection_length) { Node *ind = linitial(*indirection); if (noError) return NULL; if (IsA(ind, String)) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("type %s does not support dot notation", format_type_be(containerType)), parser_errposition(pstate, exprLocation(containerBase)))); else if (IsA(ind, A_Indices)) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("type %s does not support array subscripting", format_type_be(containerType)), parser_errposition(pstate, exprLocation(containerBase)))); else elog(ERROR, "invalid indirection operation: %d", nodeTag(ind)); } sbsroutines->transform currently will call array_subscript_transform, hstore_subscript_transform, jsonb_subscript_transform in jsonb_subscript_transform callee we unconditionally do: *indirection = list_delete_first_n(*indirection, pathlen); *indirection = list_delete_first_n(*indirection, list_length(upperIndexpr)); in array_subscript_transform, we do *indirection = list_delete_first_n(*indirection, ndim); That means, if sbsroutines->transform not error out and indirection is not NIL (which is unlikely) then sbsroutines->transform will consume some induction elements. instead of the above verbose ereport(ERROR, error handling, we can use Assert Assert(indirection_length > list_length(*indirection)); for the above comments, i did a refactoring based on v12 (0001 to 0006).
Вложения
Hi Jian,
Thanks for reviewing! I’ve attached v13, which addresses your
feedback.
feedback.
See my individual replies below, and let me know if you have any
questions!
questions!
On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote:
in v12-0001 and v12-0002.
in transformIndirection
if (!newresult)
{
/*
* generic subscripting failed; falling back to function call or
* field selection for a composite type.
*/
Node *n;
/* try to find function for field selection */
newresult = ParseFuncOrColumn(pstate,
list_make1(n),
list_make1(result),
last_srf,
NULL,
false,
location);
}
the above comments mentioning "function call" is wrong?
you passed NULL for (FuncCall *fn) in ParseFuncOrColumn.
and ParseFuncOrColumn comments says
```If fn is null, we're dealing with column syntax not function syntax.``
You are right. Fixed.
On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote:
I think coerce_jsonpath_subscript can be further simplified.
we already have message like:
errhint("jsonb subscript must be coercible to either integer or text."),
no need to pass the third argument a constant (INT4OID).
also
``Oid targetType = UNKNOWNOID;``
set it as InvalidOid would be better.
attached is a minor refactoring of coerce_jsonpath_subscript
based on (v12-0001 to v12-0004).
Thanks for the patch! I squashed it with v13-0004 and renamed the
function to coerce_jsonpath_subscript_to_int4_or_text() for clarity.
function to coerce_jsonpath_subscript_to_int4_or_text() for clarity.
On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote:
after applied v12-0001 to v12-0006
+ /* emit warning conditionally to minimize duplicate warnings */
+ if (list_length(*indirection) > 0)
+ ereport(WARNING,
+ errcode(ERRCODE_WARNING),
+ errmsg("mixed usage of jsonb simplified accessor syntax and jsonb
subscripting."),
+ errhint("use dot-notation for member access, or use non-null integer
constants subscripting for array access."),
+ parser_errposition(pstate, warning_location));
src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8];
WARNING: mixed usage of jsonb simplified accessor syntax and jsonb
subscripting.
LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
^
HINT: use dot-notation for member access, or use non-null integer
constants subscripting for array access.
ERROR: subscript type bigint is not supported
LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
^
HINT: jsonb subscript must be coercible to either integer or text.
The above example looks very bad. location printed twice, hint message
is different.
two messages level (ERROR, WARNING).
also "or use non-null integer constants subscripting for array
access." seems wrong?
as you can see the below hint message saying it could be text or integer.
select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8];
ERROR: subscript type bigint is not supported
LINE 1: ...ect ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8]...
^
HINT: jsonb subscript must be coercible to either integer or text.
also select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)[NULL::int4];
return NULL, so "use non-null integer constants" is wrong.
I see your confusion about the WARNING/ERROR messages. My intent was
to encourage users to use consistent syntax flavors rather than mixing
the SQL standard JSON simplified accessor with the pre-standard
PostgreSQL jsonb subscripting. In the meantime, I want to support
mixed syntax flavors as much as possible. However, your feedback makes
it clear that users don’t need that level of details. So, in v13 I’ve
removed the WARNING message and instead added a comment explaining how
we support mixed syntaxes.
to encourage users to use consistent syntax flavors rather than mixing
the SQL standard JSON simplified accessor with the pre-standard
PostgreSQL jsonb subscripting. In the meantime, I want to support
mixed syntax flavors as much as possible. However, your feedback makes
it clear that users don’t need that level of details. So, in v13 I’ve
removed the WARNING message and instead added a comment explaining how
we support mixed syntaxes.
Here's the relevant diff between v12 and v13, hope it helps:
--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -247,7 +247,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
ListCell *lc;
Datum jsp;
int pathlen = 0;
- int warning_location = -1;
sbsref->refupperindexpr = NIL;
sbsref->reflowerindexpr = NIL;
@@ -311,10 +310,27 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
if (jpi_from == NULL)
{
/*
- * postpone emitting the warning until the end of this
- * function
+ * Break out of the loop if the subscript is not a
+ * non-null integer constant, so that we can fall back to
+ * jsonb subscripting logic.
+ *
+ * This is needed to handle cases with mixed usage of SQL
+ * standard json simplified accessor syntax and PostgreSQL
+ * jsonb subscripting syntax, e.g:
+ *
+ * select (jb).a['b'].c from jsonb_table;
+ *
+ * where dot-notation (.a and .c) is the SQL standard json
+ * simplified accessor syntax, and the ['b'] subscript is
+ * the PostgreSQL jsonb subscripting syntax, because 'b'
+ * is not a non-null constant integer and cannot be used
+ * for json array access.
+ *
+ * In this case, we cannot create a JsonPath item, so we
+ * break out of the loop and let
+ * jsonb_subscript_transform() handle this indirection as
+ * a PostgreSQL jsonb subscript.
*/
- warning_location = exprLocation(ai->uidx);
pfree(jpi->value.array.elems);
pfree(jpi);
break;
@@ -360,14 +376,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
*indirection = list_delete_first_n(*indirection, pathlen);
- /* emit warning conditionally to minimize duplicate warnings */
- if (list_length(*indirection) > 0)
- ereport(WARNING,
- errcode(ERRCODE_WARNING),
- errmsg("mixed usage of jsonb simplified accessor syntax and jsonb subscripting."),
- errhint("use dot-notation for member access, or use non-null integer constants subscripting for array access."),
- parser_errposition(pstate, warning_location));
-
jsp = jsonPathFromParseResult(&jpres, 0, NULL);
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -247,7 +247,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
ListCell *lc;
Datum jsp;
int pathlen = 0;
- int warning_location = -1;
sbsref->refupperindexpr = NIL;
sbsref->reflowerindexpr = NIL;
@@ -311,10 +310,27 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
if (jpi_from == NULL)
{
/*
- * postpone emitting the warning until the end of this
- * function
+ * Break out of the loop if the subscript is not a
+ * non-null integer constant, so that we can fall back to
+ * jsonb subscripting logic.
+ *
+ * This is needed to handle cases with mixed usage of SQL
+ * standard json simplified accessor syntax and PostgreSQL
+ * jsonb subscripting syntax, e.g:
+ *
+ * select (jb).a['b'].c from jsonb_table;
+ *
+ * where dot-notation (.a and .c) is the SQL standard json
+ * simplified accessor syntax, and the ['b'] subscript is
+ * the PostgreSQL jsonb subscripting syntax, because 'b'
+ * is not a non-null constant integer and cannot be used
+ * for json array access.
+ *
+ * In this case, we cannot create a JsonPath item, so we
+ * break out of the loop and let
+ * jsonb_subscript_transform() handle this indirection as
+ * a PostgreSQL jsonb subscript.
*/
- warning_location = exprLocation(ai->uidx);
pfree(jpi->value.array.elems);
pfree(jpi);
break;
@@ -360,14 +376,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
*indirection = list_delete_first_n(*indirection, pathlen);
- /* emit warning conditionally to minimize duplicate warnings */
- if (list_length(*indirection) > 0)
- ereport(WARNING,
- errcode(ERRCODE_WARNING),
- errmsg("mixed usage of jsonb simplified accessor syntax and jsonb subscripting."),
- errhint("use dot-notation for member access, or use non-null integer constants subscripting for array access."),
- parser_errposition(pstate, warning_location));
-
jsp = jsonPathFromParseResult(&jpres, 0, NULL);
On Thu, Jul 10, 2025 at 6:34 AM jian he <jian.universality@gmail.com> wrote:
typedef struct FieldAccessorExpr
{
Expr xpr;
char *fieldname; /* name of the JSONB object field accessed via
* dot notation */
Oid faecollid pg_node_attr(query_jumble_ignore);
int location;
} FieldAccessorExpr;
first field as NodeTag should be just fine?
I am not sure the field "location" is needed now, if it is needed, it should be
type as ParseLoc.
we should add it to src/tools/pgindent/typedefs.list
Good catch! I changed the first field from Expr to NodeTag, removed
the location field, and added the Node to typedefs.list.
the location field, and added the Node to typedefs.list.
On Thu, Jul 10, 2025 at 10:54 PM jian he <jian.universality@gmail.com> wrote:
On Thu, Jul 10, 2025 at 9:34 PM jian he <jian.universality@gmail.com> wrote:
>
> ------------------------------------------
> in jsonb_subscript_make_jsonpath we have
> foreach(lc, *indirection)
> {
> if (IsA(accessor, String))
> ....
> else if (IsA(accessor, A_Indices))
> else
> /*
> * Unsupported node type for creating jsonpath. Instead of
> * throwing an ERROR, break here so that we create a jsonpath from
> * as many indirection elements as we can and let
> * transformIndirection() fallback to alternative logic to handle
> * the remaining indirection elements.
> */
> break;
> }
> the above ELSE branch comments look suspicious to me.
I kept the "else" branch, but added an Assert with updated comments.
Here's the relevant diff between v12 and v13:
else
{
- * Unsupported node type for creating jsonpath. Instead of
- * throwing an ERROR, break here so that we create a jsonpath from
- * as many indirection elements as we can and let
- * transformIndirection() fallback to alternative logic to handle
- * the remaining indirection elements.
+ * Unexpected node type in indirection list. This should not
+ * happen with current grammar, but we handle it defensively by
+ * breaking out of the loop rather than crashing. In case of
+ * future grammar changes that might introduce new node types,
+ * this allows us to create a jsonpath from as many indirection
+ * elements as we can and let transformIndirection() fallback to
+ * alternative logic to handle the remaining indirection elements.
+ Assert(false); /* not reachable */
break;
}
{
- * Unsupported node type for creating jsonpath. Instead of
- * throwing an ERROR, break here so that we create a jsonpath from
- * as many indirection elements as we can and let
- * transformIndirection() fallback to alternative logic to handle
- * the remaining indirection elements.
+ * Unexpected node type in indirection list. This should not
+ * happen with current grammar, but we handle it defensively by
+ * breaking out of the loop rather than crashing. In case of
+ * future grammar changes that might introduce new node types,
+ * this allows us to create a jsonpath from as many indirection
+ * elements as we can and let transformIndirection() fallback to
+ * alternative logic to handle the remaining indirection elements.
+ Assert(false); /* not reachable */
break;
}
On Thu, Jul 10, 2025 at 10:54 PM jian he <jian.universality@gmail.com> wrote:
> transformIndirection->transformContainerSubscripts->jsonb_subscript_transform->jsonb_subscript_make_jsonpath
> As you can see, transformIndirection have a long distance from
> jsonb_subscript_make_jsonpath,
> let transformIndirection handle remaining indirection elements seems not good.
In 0001's commit message, we are given the following promise:
Date: Tue Jul 8 22:18:07 2025 -0700
Allow transformation of only a sublist of subscripts
This is a preparation step for allowing subscripting containers to
transform only a prefix of an indirection list and modify the list
in-place by removing the processed elements. Currently, all elements
are consumed, and the list is set to NIL after transformation.
In the following commit, subscripting containers will gain the
flexibility to stop transformation when encountering an unsupported
indirection and return the remaining indirections to the caller.
With this contract provided by the subscripting interface to the
container's transform function, I don't feel terribly inappropriate to
let jsonb's transform function: jsonb_subscript_transform(), to
transform indirections as much as it can, and leave the rest to
transformIndirection().
container's transform function, I don't feel terribly inappropriate to
let jsonb's transform function: jsonb_subscript_transform(), to
transform indirections as much as it can, and leave the rest to
transformIndirection().
0002 is a good example that shows why the else branch is useful: it
adds support for String as a subscript node of jsonb, while String is
still unsupported in other in-core container subscripting.
adds support for String as a subscript node of jsonb, while String is
still unsupported in other in-core container subscripting.
For example, array_subscript_transform() has:
if (!IsA(lfirst(idx), A_Indices))break;
The else branch in jsonb_subscript_make_jsonpath() just follows that
same convention.
same convention.
context v12-0001 to v12-0006.
this ELSE branch comments is wrong, because
+ if (jsonb_check_jsonpath_needed(*indirection))
+ {
+ jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
+ if (sbsref->refjsonbpath)
+ return;
+ }
in jsonb_check_jsonpath_needed we already use Assert to confirm that
list "indirection"
is either String or A_Indices Node.
The Assert I added in v13, as well as the one you mentioned in
jsonb_check_jsonpath_needed(), are both only applicable in development
build and won’t help in production. So I’d prefer to break here and
let the callee handle any unexpected Node with an ERROR.
Let me know if this explanation makes sense. If not, and if you think
the current logic is overly complicated, we can discuss further.
jsonb_check_jsonpath_needed(), are both only applicable in development
build and won’t help in production. So I’d prefer to break here and
let the callee handle any unexpected Node with an ERROR.
Let me know if this explanation makes sense. If not, and if you think
the current logic is overly complicated, we can discuss further.
On Thu, Jul 10, 2025 at 10:54 PM jian he <jian.universality@gmail.com> wrote:
in transformContainerSubscripts we have
sbsroutines->transform(sbsref, indirection, pstate,
isSlice, isAssignment);
/*
* Error out, if datatype failed to consume any indirection elements.
*/
if (list_length(*indirection) == indirection_length)
{
Node *ind = linitial(*indirection);
if (noError)
return NULL;
if (IsA(ind, String))
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("type %s does not support dot notation",
format_type_be(containerType)),
parser_errposition(pstate, exprLocation(containerBase))));
else if (IsA(ind, A_Indices))
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("type %s does not support array subscripting",
format_type_be(containerType)),
parser_errposition(pstate, exprLocation(containerBase))));
else
elog(ERROR, "invalid indirection operation: %d", nodeTag(ind));
}
sbsroutines->transform currently will call
array_subscript_transform, hstore_subscript_transform, jsonb_subscript_transform
in jsonb_subscript_transform callee we unconditionally do:
*indirection = list_delete_first_n(*indirection, pathlen);
*indirection = list_delete_first_n(*indirection, list_length(upperIndexpr));
in array_subscript_transform, we do
*indirection = list_delete_first_n(*indirection, ndim);
That means, if sbsroutines->transform not error out and indirection is
not NIL (which is unlikely)
then sbsroutines->transform will consume some induction elements.
instead of the above verbose ereport(ERROR, error handling, we can use Assert
Assert(indirection_length > list_length(*indirection));
for the above comments, i did a refactoring based on v12 (0001 to 0006).
Thanks for the patch! I didn’t apply it for the same reason I
mentioned in the previous quote reply. In the case of an unexpected
Node type, I prefer raising an explicit ERROR rather than using an
Assert() that crashes in a dev build but silently continues in
production. One possible way to hit these ERRORs is by creating a
custom data type that supports subscripting, but only for a subset of
Node types that transformIndirection() allows.
mentioned in the previous quote reply. In the case of an unexpected
Node type, I prefer raising an explicit ERROR rather than using an
Assert() that crashes in a dev build but silently continues in
production. One possible way to hit these ERRORs is by creating a
custom data type that supports subscripting, but only for a subset of
Node types that transformIndirection() allows.
Best,
Alex
Вложения
- v13-0004-Extract-coerce_jsonpath_subscript.patch
- v13-0007-Implement-jsonb-wildcard-member-accessor.patch
- v13-0006-Implement-Jsonb-subscripting-with-slicing.patch
- v13-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v13-0005-Implement-read-only-dot-notation-for-jsonb.patch
- v13-0003-Export-jsonPathFromParseResult.patch
- v13-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
On Thu, Aug 21, 2025 at 12:54 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > Hi Jian, > > Thanks for reviewing! I’ve attached v13, which addresses your > feedback. > hi. in the context of v13-0001 and v13-0002. In transformIndirection: while (subscripts) { Node *newresult = (Node *) = transformContainerSubscripts(....&subscripts...) } In transformContainerSubscripts: sbsroutines->transform(sbsref, indirection, pstate, isSlice, isAssignment); /* * Error out, if datatype failed to consume any indirection elements. */ if (list_length(*indirection) == indirection_length) { } As you can see from the above code pattern, "sbsroutines->transform" is responsible for consuming the "indirection". If it doesn’t consume it, the function should either return NULL or raise an error. But what happens if the elements consumed by "sbsroutines->transform" are not contiguous? jsonb_subscript_transform below changes in v13-0002 + if (!IsA(lfirst(idx), A_Indices)) + break; may cause elements consumed not contiguous. diff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.c index 8ad6aa1ad4f..a0d38a0fd80 100644 --- a/src/backend/utils/adt/jsonbsubs.c +++ b/src/backend/utils/adt/jsonbsubs.c @@ -55,9 +55,14 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, */ foreach(idx, *indirection) { - A_Indices *ai = lfirst_node(A_Indices, idx); + A_Indices *ai; Node *subExpr; + if (!IsA(lfirst(idx), A_Indices)) + break; + + ai = lfirst_node(A_Indices, idx); + if (isSlice) { Node *expr = ai->uidx ? ai->uidx : ai->lidx; @@ -160,7 +165,9 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, sbsref->refrestype = JSONBOID; sbsref->reftypmod = -1; - *indirection = NIL; + /* Remove processed elements */ + if (upperIndexpr) + *indirection = list_delete_first_n(*indirection, list_length(upperIndexpr)); } If the branch ``if (!IsA(lfirst(idx), A_Indices))`` is ever reached, then ``*indirection = list_delete_first_n(*indirection, list_length(upperIndexpr));`` might produce an incorrect result? However, it appears that this branch is currently unreachable, at least regress tests not coverage for + if (!IsA(lfirst(idx), A_Indices)) + break; Later patch we may have resolved this issue, but in the context of 0001 and 0002, this seem inconsistent?
Hi Jian,
On Fri, Aug 22, 2025 at 1:20 AM jian he <jian.universality@gmail.com> wrote:
On Thu, Aug 21, 2025 at 12:54 PM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
>
> Hi Jian,
>
> Thanks for reviewing! I’ve attached v13, which addresses your
> feedback.
>
hi.
in the context of v13-0001 and v13-0002.
In transformIndirection:
while (subscripts)
{
Node *newresult = (Node *) =
transformContainerSubscripts(....&subscripts...)
}
In transformContainerSubscripts:
sbsroutines->transform(sbsref, indirection, pstate,
isSlice, isAssignment);
/*
* Error out, if datatype failed to consume any indirection elements.
*/
if (list_length(*indirection) == indirection_length)
{
}
As you can see from the above code pattern, "sbsroutines->transform" is
responsible for consuming the "indirection".
If it doesn’t consume it, the function should either return NULL or raise an
error.
But what happens if the elements consumed by "sbsroutines->transform"
are not contiguous?
jsonb_subscript_transform below changes in v13-0002
+ if (!IsA(lfirst(idx), A_Indices))
+ break;
may cause elements consumed not contiguous.
diff --git a/src/backend/utils/adt/jsonbsubs.c
b/src/backend/utils/adt/jsonbsubs.c
index 8ad6aa1ad4f..a0d38a0fd80 100644
--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -55,9 +55,14 @@ jsonb_subscript_transform(SubscriptingRef *sbsref,
*/
foreach(idx, *indirection)
{
- A_Indices *ai = lfirst_node(A_Indices, idx);
+ A_Indices *ai;
Node *subExpr;
+ if (!IsA(lfirst(idx), A_Indices))
+ break;
+
+ ai = lfirst_node(A_Indices, idx);
+
if (isSlice)
{
Node *expr = ai->uidx ? ai->uidx : ai->lidx;
@@ -160,7 +165,9 @@ jsonb_subscript_transform(SubscriptingRef *sbsref,
sbsref->refrestype = JSONBOID;
sbsref->reftypmod = -1;
- *indirection = NIL;
+ /* Remove processed elements */
+ if (upperIndexpr)
+ *indirection = list_delete_first_n(*indirection, list_length(upperIndexpr));
}
If the branch ``if (!IsA(lfirst(idx), A_Indices))`` is ever reached, then
``*indirection = list_delete_first_n(*indirection, list_length(upperIndexpr));``
might produce an incorrect result?
However, it appears that this branch is currently unreachable,
at least regress tests not coverage for
+ if (!IsA(lfirst(idx), A_Indices))
+ break;
Later patch we may have resolved this issue, but in the context of
0001 and 0002,
this seem inconsistent?
I don’t understand the question. In the case of an unsupported Node
type (not an Indices in patch 0001 or 0002), we break out of the loop
to stop transforming the remaining subscripts. So there won’t be any
‘not contiguous’ indirection elements.
type (not an Indices in patch 0001 or 0002), we break out of the loop
to stop transforming the remaining subscripts. So there won’t be any
‘not contiguous’ indirection elements.
Best,
Alex
On Sat, Aug 23, 2025 at 3:34 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > I don’t understand the question. In the case of an unsupported Node > type (not an Indices in patch 0001 or 0002), we break out of the loop > to stop transforming the remaining subscripts. So there won’t be any > ‘not contiguous’ indirection elements. > Sorry, my last message is wrong. this time, I applied v13-0001 to v13-0005. and I found some minor issues..... v13-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch + /* + * Error out, if datatype failed to consume any indirection elements. + */ + if (list_length(*indirection) == indirection_length) + { + Node *ind = linitial(*indirection); + + if (noError) + return NULL; + + if (IsA(ind, String)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("type %s does not support dot notation", + format_type_be(containerType)), + parser_errposition(pstate, exprLocation(containerBase)))); + else if (IsA(ind, A_Indices)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("type %s does not support array subscripting", + format_type_be(containerType)), + parser_errposition(pstate, exprLocation(containerBase)))); + else + elog(ERROR, "invalid indirection operation: %d", nodeTag(ind)); + } these error message still not reached after I apply v13-0005-Implement-read-only-dot-notation-for-jsonb.patch maybe we can simply do + if (noError) + return NULL; + + ereport(ERROR, + errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("type %s does not support subscripting", + format_type_be(containerType)), + parser_errposition(pstate, exprLocation(containerBase))); ? for V13-0004. in master we have if (subExpr == NULL) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("jsonb subscript must have text type"), parser_errposition(pstate, exprLocation(subExpr)))); but this part is removed from v13-0004-Extract-coerce_jsonpath_subscript ? coerce_jsonpath_subscript_to_int4_or_text maybe we can just rename to coerce_jsonpath_subscript, then add some comments explaining why currently only support result node coerce to text or int4 data type. for v13-0005-Implement-read-only-dot-notation-for-jsonb.patch + i = 0; + foreach(lc, sbsref->refupperindexpr) { + Expr *e = (Expr *) lfirst(lc); + + /* When slicing, individual subscript bounds can be omitted */ + if (!e) { + sbsrefstate->upperprovided[i] = false; + sbsrefstate->upperindexnull[i] = true; + } else { + sbsrefstate->upperprovided[i] = true; + /* Each subscript is evaluated into appropriate array entry */ + ExecInitExprRec(e, state, + &sbsrefstate->upperindex[i], + &sbsrefstate->upperindexnull[i]); + } + i++; } curly braces should be put into the next new line. there are two + if (!sbsref->refjsonbpath) +{ +} maybe we can put these two together. +SELECT (jb)['b']['x'].z FROM test_jsonb_dot_notation; -- returns NULL with warnings + z +--- + +(1 row) there is no warning, "returns NULL with warnings" should be removed. +-- clean up +DROP TABLE test_jsonb_dot_notation; \ No newline at end of file to remove "\ No newline at end of file" you need to add a newline to jsonb.sql, you may see other regress sql test files. + foreach(lc, *indirection) + if() + { + } + else + { + /* + * Unexpected node type in indirection list. This should not + * happen with current grammar, but we handle it defensively by + * breaking out of the loop rather than crashing. In case of + * future grammar changes that might introduce new node types, + * this allows us to create a jsonpath from as many indirection + * elements as we can and let transformIndirection() fallback to + * alternative logic to handle the remaining indirection elements. + */ + Assert(false); /* not reachable */ + break; + } + + /* append path item */ + path->next = jpi; + path = jpi; + pathlen++; If the above else branch is reached, it will crash due to `Assert(false);`, which contradicts the preceding comments. to make the comments accurate, we need remove " Assert(false); /* not reachable */" ? /* - * Transform and convert the subscript expressions. Jsonb subscripting - * does not support slices, look only at the upper index. + * We would only reach here if json simplified accessor is not needed, or + * if jsonb_subscript_make_jsonpath() didn't consume any indirection + * element — either way, the first indirection element could not be + * converted into a JsonPath component. This happens when it's a non-slice + * A_Indices with a non-integer upper index. The code below falls back to + * traditional jsonb subscripting for such cases. */ the above comments, "non-integer" part is wrong? For example, the below two SELECTs both use "traditional" jsonb subscripting logic. CREATE TABLE ts1 AS SELECT '[1, [2]]' ::jsonb jb; SELECT (jb)[1] FROM ts1; SELECT (jb)['a'] FROM ts1;
Hi Jian,
I’ve attached v14, which includes only indentation and comment changes
from v13.
from v13.
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
On Sat, Aug 23, 2025 at 3:34 AM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
>
> I don’t understand the question. In the case of an unsupported Node
> type (not an Indices in patch 0001 or 0002), we break out of the loop
> to stop transforming the remaining subscripts. So there won’t be any
> ‘not contiguous’ indirection elements.
>
Sorry, my last message is wrong.
this time, I applied v13-0001 to v13-0005.
and I found some minor issues.....
No problem. Thanks for reviewing.
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
v13-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
+ /*
+ * Error out, if datatype failed to consume any indirection elements.
+ */
+ if (list_length(*indirection) == indirection_length)
+ {
+ Node *ind = linitial(*indirection);
+
+ if (noError)
+ return NULL;
+
+ if (IsA(ind, String))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("type %s does not support dot notation",
+ format_type_be(containerType)),
+ parser_errposition(pstate, exprLocation(containerBase))));
+ else if (IsA(ind, A_Indices))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("type %s does not support array subscripting",
+ format_type_be(containerType)),
+ parser_errposition(pstate, exprLocation(containerBase))));
+ else
+ elog(ERROR, "invalid indirection operation: %d", nodeTag(ind));
+ }
these error message still not reached after I apply
v13-0005-Implement-read-only-dot-notation-for-jsonb.patch
maybe we can simply do
+ if (noError)
+ return NULL;
+
+ ereport(ERROR,
+ errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("type %s does not support subscripting",
+ format_type_be(containerType)),
+ parser_errposition(pstate, exprLocation(containerBase)));
?
The regression tests do not currently trigger these ERROR cases
because Assignment for dot-notation has not yet been implemented. At
present, the noError argument of transformContainerSubscripts() is set
to true for SELECT and false for UPDATE. As a result, the
ereport(ERROR) calls are never reached in SELECT queries. This patch
series also does not aim to implement dot-notation for UPDATE
statements that assign to a jsonb field. We have therefore made only
minimal adjustments in transformAssignmentIndirection(). Since the
assignment code path is separate, it likewise does not reach the
ereport(ERROR) cases with in-core data types. Once dot-notation
support for Assignment is added, those ERROR messages will become
relevant.
because Assignment for dot-notation has not yet been implemented. At
present, the noError argument of transformContainerSubscripts() is set
to true for SELECT and false for UPDATE. As a result, the
ereport(ERROR) calls are never reached in SELECT queries. This patch
series also does not aim to implement dot-notation for UPDATE
statements that assign to a jsonb field. We have therefore made only
minimal adjustments in transformAssignmentIndirection(). Since the
assignment code path is separate, it likewise does not reach the
ereport(ERROR) cases with in-core data types. Once dot-notation
support for Assignment is added, those ERROR messages will become
relevant.
That said, the ERROR messages are not "dead code." They can still be
triggered when working with custom data types. For example, you can
define a custom array-like type by copying the in-core array type and
slightly modifying its *_subscript_transform() function. For now, you
can simply update array_subscript_transform() in-place with the
following diff:
triggered when working with custom data types. For example, you can
define a custom array-like type by copying the in-core array type and
slightly modifying its *_subscript_transform() function. For now, you
can simply update array_subscript_transform() in-place with the
following diff:
--- a/src/backend/utils/adt/arraysubs.c
+++ b/src/backend/utils/adt/arraysubs.c
@@ -82,6 +82,8 @@ array_subscript_transform(SubscriptingRef *sbsref,
ai = lfirst_node(A_Indices, idx);
+ if (isSlice)
+ break;
if (isSlice)
{
if (ai->lidx)
+++ b/src/backend/utils/adt/arraysubs.c
@@ -82,6 +82,8 @@ array_subscript_transform(SubscriptingRef *sbsref,
ai = lfirst_node(A_Indices, idx);
+ if (isSlice)
+ break;
if (isSlice)
{
if (ai->lidx)
This way, we have a *_subscript_transform function that you can
register for a custom array-like data type that does not handle
slicing. Then you can hit the error message like this:
register for a custom array-like data type that does not handle
slicing. Then you can hit the error message like this:
test=# update t set arr[1:] = 1;
ERROR: 42804: type integer[] does not support array subscripting
LINE 1: update t set arr[1:] = 1;
^
LOCATION: transformContainerSubscripts, parse_node.c:345
ERROR: 42804: type integer[] does not support array subscripting
LINE 1: update t set arr[1:] = 1;
^
LOCATION: transformContainerSubscripts, parse_node.c:345
test=# \d t
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+-----------+-----------+----------+---------
arr | integer[] | | |
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+-----------+-----------+----------+---------
arr | integer[] | | |
particular (since one might expect something about slicing), but it
still demonstrates how this code path is relevant. In the meantime, I
don’t think it’s worth adding a new data type to exercise this error
message, given that we are likely to make changes to the assignment
code path for in-core types.
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
for V13-0004.
in master we have
if (subExpr == NULL)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("jsonb subscript must have text type"),
parser_errposition(pstate, exprLocation(subExpr))));
but this part is removed from v13-0004-Extract-coerce_jsonpath_subscript
?
I removed this one because you asked me to (see our past conversation
for v11).
On Wed, Jul 9, 2025 at 1:01 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote:v11-0004-Extract-coerce_jsonpath_subscript.patch
+ /*
+ * We known from can_coerce_type that coercion will succeed, so
+ * coerce_type could be used. Note the implicit coercion context, which is
+ * required to handle subscripts of different types, similar to overloaded
+ * functions.
+ */
+ subExpr = coerce_type(pstate,
+ subExpr, subExprType,
+ targetType, -1,
+ COERCION_IMPLICIT,
+ COERCE_IMPLICIT_CAST,
+ -1);
+ if (subExpr == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("jsonb subscript must have text type"),
the targetType can be "integer", then the error message
errmsg("jsonb subscript must have text type") would be wrong?
Also this error handling is not necessary.
since we can_coerce_type already tell us that coercion will succeed.
Also, do we really put v11-0004 as a separate patch?Thanks for pointing this out! I’ve removed the unnecessary error
handling as you suggested. I kept v12-0004 as a separate patch to make
the review easier, but I’ll leave it up to the committer to decide
whether to squash it with the other commits.
one option so we don’t keep going back and forth.
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
coerce_jsonpath_subscript_to_int4_or_text
maybe we can just rename to
coerce_jsonpath_subscript,
then add some comments explaining why currently only support result node coerce
to text or int4 data type.
What is the reason for renaming this function to a less explicit name?
It was previously called coerce_jsonpath_subscript() because we passed
INT4OID as an argument to specify the target type. I followed your
earlier review comment (for v12) to remove that argument and hard-code
INT4OID within the function. Given that change, I think it’s clearer
to keep a more explicit name that describes what the function
achieves.
It was previously called coerce_jsonpath_subscript() because we passed
INT4OID as an argument to specify the target type. I followed your
earlier review comment (for v12) to remove that argument and hard-code
INT4OID within the function. Given that change, I think it’s clearer
to keep a more explicit name that describes what the function
achieves.
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
for v13-0005-Implement-read-only-dot-notation-for-jsonb.patch
+ i = 0;
+ foreach(lc, sbsref->refupperindexpr) {
+ Expr *e = (Expr *) lfirst(lc);
+
+ /* When slicing, individual subscript bounds can be omitted */
+ if (!e) {
+ sbsrefstate->upperprovided[i] = false;
+ sbsrefstate->upperindexnull[i] = true;
+ } else {
+ sbsrefstate->upperprovided[i] = true;
+ /* Each subscript is evaluated into appropriate array entry */
+ ExecInitExprRec(e, state,
+ &sbsrefstate->upperindex[i],
+ &sbsrefstate->upperindexnull[i]);
+ }
+ i++;
}
curly braces should be put into the next new line.
Fixed. Thank you!
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
there are two
+ if (!sbsref->refjsonbpath)
+{
+}
maybe we can put these two together.
reduce readability in my opinion.
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
+SELECT (jb)['b']['x'].z FROM test_jsonb_dot_notation; -- returns NULL
with warnings
+ z
+---
+
+(1 row)
there is no warning, "returns NULL with warnings" should be removed.
Fixed. Thank you!
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
+-- clean up
+DROP TABLE test_jsonb_dot_notation;
\ No newline at end of file
to remove "\ No newline at end of file"
you need to add a newline to jsonb.sql, you may see other regress sql
test files.
Fixed. Thank you!
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
+ foreach(lc, *indirection)
+ if()
+ {
+ }
+ else
+ {
+ /*
+ * Unexpected node type in indirection list. This should not
+ * happen with current grammar, but we handle it defensively by
+ * breaking out of the loop rather than crashing. In case of
+ * future grammar changes that might introduce new node types,
+ * this allows us to create a jsonpath from as many indirection
+ * elements as we can and let transformIndirection() fallback to
+ * alternative logic to handle the remaining indirection elements.
+ */
+ Assert(false); /* not reachable */
+ break;
+ }
+
+ /* append path item */
+ path->next = jpi;
+ path = jpi;
+ pathlen++;
If the above else branch is reached, it will crash due to `Assert(false);`,
which contradicts the preceding comments.
to make the comments accurate, we need remove
" Assert(false); /* not reachable */"
?
The Assert is intentional. We want to fail fast for the “not
reachable” case in the development build, while breaking out of the
loop in the production build.
reachable” case in the development build, while breaking out of the
loop in the production build.
On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote:
/*
- * Transform and convert the subscript expressions. Jsonb subscripting
- * does not support slices, look only at the upper index.
+ * We would only reach here if json simplified accessor is not needed, or
+ * if jsonb_subscript_make_jsonpath() didn't consume any indirection
+ * element — either way, the first indirection element could not be
+ * converted into a JsonPath component. This happens when it's a non-slice
+ * A_Indices with a non-integer upper index. The code below falls back to
+ * traditional jsonb subscripting for such cases.
*/
the above comments, "non-integer" part is wrong?
For example, the below two SELECTs both use "traditional" jsonb
subscripting logic.
CREATE TABLE ts1 AS SELECT '[1, [2]]' ::jsonb jb;
SELECT (jb)[1] FROM ts1;
SELECT (jb)['a'] FROM ts1;
I updated the comment. In the updated comment, I also chose to use
“pre-standard” json subscripting instead of “traditional” json
subscripting. Hope that helps!
“pre-standard” json subscripting instead of “traditional” json
subscripting. Hope that helps!
Best,
Alex
Вложения
- v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
- v14-0003-Export-jsonPathFromParseResult.patch
- v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v14-0005-Implement-read-only-dot-notation-for-jsonb.patch
- v14-0007-Implement-jsonb-wildcard-member-accessor.patch
- v14-0006-Implement-Jsonb-subscripting-with-slicing.patch
- v14-0004-Extract-coerce_jsonpath_subscript.patch
On Tue, Aug 26, 2025 at 11:53 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > Hi Jian, > > I’ve attached v14, which includes only indentation and comment changes > from v13. > hi. still reviewing v14-0001 to v14-0005. I am confused by the comments in jsonb_subscript_transform "" * (b) jsonb_subscript_make_jsonpath() examined the first indirection * element but could not turn it into a JsonPath component (for example, * ['a']). """ CREATE TABLE ts2 AS SELECT '{"a": "b"}' ::jsonb jb; SELECT (jb)['a'] FROM ts2; SELECT (jb).a['a'] FROM ts2; in these two cases, ['a'], it won't reach jsonb_subscript_make_jsonpath, because jsonb_check_jsonpath_needed will return false. maybe I am missing something, for the above point b, can you use some SQL example to explain it? make_jsonpath_item_expr(ParseState *pstate, Node *expr, List **exprs) if expr satisfies the condition, then append the transformed expr to List exprs if not returns NULL. The list append logic is within make_jsonpath_item_int. seems not that intuitive (another level of indirection) maybe we don't need make_jsonpath_item_int. Another reason would be in make_jsonpath_item_int it's not easy found out that exprs actually refers to SubscriptingRef->refupperindexpr also v14-0006-Implement-Jsonb-subscripting-with-slicing.patch also doesn't use make_jsonpath_item_int that much frequently. -------------------------------- in v14-0005-Implement-read-only-dot-notation-for-jsonb.patch + * In addition to building the JsonPath expression, this function populates + * the following fields of the given SubscriptingRef: + * - refjsonbpath: the generated JsonPath + * - refupperindexpr: upper index expressions (object keys or array indexes) + * - reflowerindexpr: lower index expressions, remains NIL as slices are not yet supported. "reflowerindexpr" changed in v14-0006, so "reflowerindexpr" comments in v14-0006 need to change. but "reflowerindexpr" we didn't touch in v14-0005, so the "reflowerindexpr" comments look weird. +SELECT (jb).a['b'] FROM test_jsonb_dot_notation; -- returns NULL due to strict mode we can not specify strict just like, ``select 'strict $'::jsonpath;`` all the patches in here are default to lax mode. so I am confused by this comment. ("strict mode").
On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:<v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch>Best,Alex
I just started with 0001, I just got a comment:
diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h
index f7d07c84542..58a4b9df157 100644
--- a/src/include/parser/parse_node.h
+++ b/src/include/parser/parse_node.h
@@ -361,7 +361,7 @@ extern SubscriptingRef *transformContainerSubscripts(ParseState *pstate,
Node *containerBase,
Oid containerType,
int32 containerTypMod,
- List *indirection,
+ List **indirection,
bool isAssignment);
Here we change “indirection” from * to **, I guess the reason is to indicate if the indirection list is fully processed. In case of fully processed, set it to NULL, then caller gets NULL via **.
As “indirection” is of type “List *”, can we just set “indirection->length = 0”? I checked pg_list.h, there is not a function or macro to cleanup a list (free elements and reset length to 0). There is a “list_free(List *list)”, but it will free “list” itself as well. Maybe we can add a new function, say “list_reset(List *list)” or “list_cleanup(List *list)”. This way will keep the function interface unchanged, and less changes would be involved.
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index d66276801c6..e1565e11d09 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -466,14 +466,13 @@ transformIndirection(ParseState *pstate, A_Indirection *ind)
Assert(IsA(n, String));
/* process subscripts before this field selection */
- if (subscripts)
+ while (subscripts)
result = (Node *) transformContainerSubscripts(pstate,
Changing “if” to “while” in 0001 is a little confusing. Because the impression is that, transform will stop when it cannot handle next indirection, so that transform again will also fail. Maybe my impression is wrong, can you add some comments here to explain why change “if” to “while”.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:<v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch>Best,Alex
I found a bug.
```
INSERT INTO test_jsonb_types (data) VALUES
('[1, 2, "three"]'),
('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}’);
```
If I use a index following a slice, it doesn’t work:
```
evantest=# select data[0] from test_jsonb_types;
data
------
1
(2 rows)
evantest=# select data[0:2][1] from test_jsonb_types; # This should return “2"
data
------
(2 rows)
evantest=# select (t.data)['con']['a'][0:1] from test_jsonb_types t; # returned the slice properly
data
-----------------------------------------------------
[{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]
(2 rows)
evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # also returned the slice, which is wrong
data
-----------------------------------------------------
[{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]
(2 rows)
```
We should consider a slice as a container, so the fix is simple. My quick unpolished fix is:
```
chaol@ChaodeMacBook-Air postgresql % git diff
diff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.c
index cb72d12ca3f..8845dcf239a 100644
--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -247,6 +247,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
ListCell *lc;
Datum jsp;
int pathlen = 0;
+ bool isSlice = false;
sbsref->refupperindexpr = NIL;
sbsref->reflowerindexpr = NIL;
@@ -285,6 +286,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
if (ai->is_slice)
{
+ isSlice = true;
while (list_length(sbsref->reflowerindexpr) < list_length(sbsref->refupperindexpr))
sbsref->reflowerindexpr = lappend(sbsref->reflowerindexpr, NULL);
@@ -369,6 +371,9 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
path->next = jpi;
path = jpi;
pathlen++;
+
+ if (isSlice)
+ break;
}
if (pathlen == 0)
```
After the fix, let’s test again:
```
evantest=# select data[0:2][1] from test_jsonb_types; # good result
data
------
2
(2 rows)
evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # good result
data
-------------------------
{"b": {"c": {"d": 99}}}
(2 rows)
```
Regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
<v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch>Best,Alex
I am trying to split different topics to different email to keep every issue to be focused.
I also have a suggestion.
If I do:
```
— s1
select (t.data)['con']['a'][1]['b']['c']['d'] from test_jsonb_types t;
—s2
select (t.data).con.a[1].b['c'].d from test_jsonb_types t;
```
The two statements are actually identical. But they generate quite different rewritten query trees. S1’s rewritten tree is much simpler than s2’s. However, their plan trees are the same.
I think we can convert string indirections to indices indirection first before transform indirections, so that the two SQLs will generate the same rewritten query tree, which would simplify the code logic of rewriting query tree and the planner.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
<v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch>
A few other comments:
-static Node *
-transformIndirection(ParseState *pstate, A_Indirection *ind)
+Node *
+transformIndirection(ParseState *pstate, A_Indirection *ind,
+ bool *trailing_star_expansion)
{
Node *last_srf = pstate->p_last_srf;
Node *result = transformExprRecurse(pstate, ind->arg);
@@ -454,12 +454,7 @@ transformIndirection(ParseState *pstate, A_Indirection *ind)
if (IsA(n, A_Indices))
subscripts = lappend(subscripts, n);
else if (IsA(n, A_Star))
- {
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("row expansion via \"*\" is not supported here"),
- parser_errposition(pstate, location)));
- }
+ subscripts = lappend(subscripts, n);
else
{
Assert(IsA(n, String));
Now, the 3 clauses are quite similar, the if-elseif-else can be simplified as:
If (!IsA(n, A_Indices) && !Is_A(n, A_Start))
Assert(IsA(n, String));
Subscripts = lappend(subscripts, n)
+static bool
+jsonb_check_jsonpath_needed(List *indirection)
+{
+ ListCell *lc;
+
+ foreach(lc, indirection)
+ {
+ Node *accessor = lfirst(lc);
+
+ if (IsA(accessor, String))
+ return true;
Like I mentioned in my previous comment, if we convert String (dot notation) to Indices in rewritten phase, then jsonpath might be no longer needed, and the code logic is much simplified.
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -244,7 +244,7 @@ transformContainerSubscripts(ParseState *pstate,
Node *containerBase,
Oid containerType,
int32 containerTypMod,
- List *indirection,
+ List **indirection,
bool isAssignment)
{
SubscriptingRef *sbsref;
@@ -280,7 +280,7 @@ transformContainerSubscripts(ParseState *pstate,
* element. If any of the items are slice specifiers (lower:upper), then
* the subscript expression means a container slice operation.
*/
- foreach(idx, indirection)
+ foreach(idx, *indirection)
{
A_Indices *ai = lfirst_node(A_Indices, idx);
Should this foreach be pull up to out of transformContainerSubscripts()? Originally transformContainerSubscripts() runs only once, but now it’s placed in a while loop, and every time this foreach needs to go through the entire indirection list, which are duplicated computation.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
Hi Chao,
Thanks for reviewing!
On Thu, Aug 28, 2025 at 8:29 PM Chao Li <li.evan.chao@gmail.com> wrote:
On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:<v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch>Best,AlexI found a bug.
```INSERT INTO test_jsonb_types (data) VALUES('[1, 2, "three"]'),('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}’);```If I use a index following a slice, it doesn’t work:```evantest=# select data[0] from test_jsonb_types;data------1(2 rows)evantest=# select data[0:2][1] from test_jsonb_types; # This should return “2"data------(2 rows)evantest=# select (t.data)['con']['a'][0:1] from test_jsonb_types t; # returned the slice properlydata-----------------------------------------------------[{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}](2 rows)evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # also returned the slice, which is wrongdata-----------------------------------------------------[{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}](2 rows)```We should consider a slice as a container, so the fix is simple. My quick unpolished fix is:```chaol@ChaodeMacBook-Air postgresql % git diffdiff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.cindex cb72d12ca3f..8845dcf239a 100644--- a/src/backend/utils/adt/jsonbsubs.c+++ b/src/backend/utils/adt/jsonbsubs.c@@ -247,6 +247,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, SubscriptiListCell *lc;Datum jsp;int pathlen = 0;+ bool isSlice = false;sbsref->refupperindexpr = NIL;sbsref->reflowerindexpr = NIL;@@ -285,6 +286,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscriptiif (ai->is_slice){+ isSlice = true;while (list_length(sbsref->reflowerindexpr) < list_length(sbsref->refupperindexpr))sbsref->reflowerindexpr = lappend(sbsref->reflowerindexpr, NULL);@@ -369,6 +371,9 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscriptipath->next = jpi;path = jpi;pathlen++;++ if (isSlice)+ break;}if (pathlen == 0)```After the fix, let’s test again:```evantest=# select data[0:2][1] from test_jsonb_types; # good resultdata------2(2 rows)evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # good resultdata-------------------------{"b": {"c": {"d": 99}}}(2 rows)```
TL;DR: It is a feature, not a bug.
See longer explanation below:
This behavior aligns with the SQL:2023 standard. While the result you
expected is more intuitive in my opinion, it is incorrect according to
the spec.
expected is more intuitive in my opinion, it is incorrect according to
the spec.
As I mentioned in the commit message of patch v14-0005:
The SQL standard states that simplified access is equivalent to:
JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)
where:
VEP = <value expression primary>
JC = <JSON simplified accessor op chain>
JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)
where:
VEP = <value expression primary>
JC = <JSON simplified accessor op chain>
The most relevant detail that I want to highlight is "WITH CONDITIONAL
ARRAY WRAPPER". The documentation[1] says:
ARRAY WRAPPER". The documentation[1] says:
> If the path expression may return multiple values, it might be
> necessary to wrap those values using the WITH WRAPPER clause to make
> it a valid JSON string, because the default behavior is to not wrap
> them, as if WITHOUT WRAPPER were specified. The WITH WRAPPER clause is
> by default taken to mean WITH UNCONDITIONAL WRAPPER, which means that
> even a single result value will be wrapped. To apply the wrapper only
> when multiple values are present, specify WITH CONDITIONAL WRAPPER.
> Getting multiple values in result will be treated as an error if
> WITHOUT WRAPPER is specified.
> necessary to wrap those values using the WITH WRAPPER clause to make
> it a valid JSON string, because the default behavior is to not wrap
> them, as if WITHOUT WRAPPER were specified. The WITH WRAPPER clause is
> by default taken to mean WITH UNCONDITIONAL WRAPPER, which means that
> even a single result value will be wrapped. To apply the wrapper only
> when multiple values are present, specify WITH CONDITIONAL WRAPPER.
> Getting multiple values in result will be treated as an error if
> WITHOUT WRAPPER is specified.
So, for your test queries:
select data[0:2] from test_jsonb_types;
select data[0:2][1] from test_jsonb_types;
select (t.data)['con']['a'][0:1] from test_jsonb_types t;
select (t.data)['con']['a'][0:1][0] from test_jsonb_types t;
We have these equivalents using json_query():
select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
select json_query(data, 'lax $[0 to 2][1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
select json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; -- **[NOTE]**
select json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; -- **[NOTE]**
**[NOTE]**: .a and ['a'] (as well as .con and ['con']) are not
syntactically equivalent, as the dot-notation .a is in "lax" mode,
whereas the pre-standard subscript ['a'] is in "strict" mode. I will
discuss this more in a separate reply to your other comment. However,
for the specific data we inserted in your example table, they happen
to return the same results. Since our focus here is not dot-notation,
syntactically equivalent, as the dot-notation .a is in "lax" mode,
whereas the pre-standard subscript ['a'] is in "strict" mode. I will
discuss this more in a separate reply to your other comment. However,
for the specific data we inserted in your example table, they happen
to return the same results. Since our focus here is not dot-notation,
we won’t go further into it here.
You can verify correctness with:
test=# select data[0:2] = json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
?column?
----------
t
t
(2 rows)
?column?
----------
t
t
(2 rows)
test=# select (data[0:2][1] is NULL) AND (json_query(data, 'lax $[0 to 2][1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) is NULL) from test_jsonb_types;
?column?
----------
t
t
(2 rows)
?column?
----------
t
t
(2 rows)
test=# select (((t.data)['con']['a'][0:1] IS NULL) AND (json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) IS NULL)) OR ((t.data)['con']['a'][0:1] = json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)) from test_jsonb_types t;
?column?
----------
t
t
(2 rows)
?column?
----------
t
t
(2 rows)
test=# select (((t.data)['con']['a'][0:1][0] IS NULL) AND (json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) IS NULL)) OR ((t.data)['con']['a'][0:1][0] = json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)) from test_jsonb_types t;
?column?
----------
t
t
(2 rows)
?column?
----------
t
t
(2 rows)
All tests show "t", confirming the current results are all correct.
I think the root of your confusion is the meaning of CONDITIONAL ARRAY
WRAPPER. So let’s try more examples:
-- keep your previous setup
drop table test_jsonb_types;
create table test_jsonb_types (data jsonb);
INSERT INTO test_jsonb_types (data) VALUES ('[1, 2, "three"]'), ('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}');
Let's start with:
select data[0:2] from test_jsonb_types;
It is equivalent to:
select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
They all have the following output:
test=# select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
---------------------------------------------------------------------
[1, 2, "three"]
{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)
json_query
---------------------------------------------------------------------
[1, 2, "three"]
{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)
To find out what WITH CONDITIONAL ARRAY WRAPPER does, let's toggle it
to WITHOUT ARRAY WRAPPER:
to WITHOUT ARRAY WRAPPER:
test=# select json_query(data, 'lax $[0 to 2]' WITHOUT ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
---------------------------------------------------------------------
{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)
json_query
---------------------------------------------------------------------
{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)
The first row return NULL, because we've specified NULL ON ERROR, so
let's toggle that as well to ERROR ON ERROR:
let's toggle that as well to ERROR ON ERROR:
test=# select json_query(data, 'lax $[0 to 2]' WITHOUT ARRAY WRAPPER NULL ON EMPTY ERROR ON ERROR) from test_jsonb_types;
ERROR: 22034: JSON path expression in JSON_QUERY must return single item when no wrapper is requested
HINT: Use the WITH WRAPPER clause to wrap SQL/JSON items into an array.
LOCATION: JsonPathQuery, jsonpath_exec.c:3987
ERROR: 22034: JSON path expression in JSON_QUERY must return single item when no wrapper is requested
HINT: Use the WITH WRAPPER clause to wrap SQL/JSON items into an array.
LOCATION: JsonPathQuery, jsonpath_exec.c:3987
As shown, without ARRAY WRAPPER, the query produces a sequence of JSON
value items, not a JSON array.
value items, not a JSON array.
Note that array wrapping is only applied to the final result of a
jsonpath, not to each intermediate result in the chain. See the
following example:
jsonpath, not to each intermediate result in the chain. See the
following example:
-- new setup
truncate test_jsonb_types;
INSERT INTO test_jsonb_types (data) VALUES ('[1, 2, "three"]'), ('[1, [2, 22], "three"]');
test=# select json_query(data, 'lax $[0 to 2][1]' WITHOUT ARRAY WRAPPER NULL ON EMPTY ERROR ON ERROR) from test_jsonb_types;
json_query
------------
22
(2 rows)
json_query
------------
22
(2 rows)
In this case, you can see more clearly that "[0 to 2]" fetches three
individual jsonb array elements, and "[1]" treats each of the three
jsonb values as independent jsonb arrays, reading the first element of
each. This is different from wrapping the intermediate result into an
array and accessing the first element of that wrapped array.
individual jsonb array elements, and "[1]" treats each of the three
jsonb values as independent jsonb arrays, reading the first element of
each. This is different from wrapping the intermediate result into an
array and accessing the first element of that wrapped array.
Another thing I want to point out is that there is a trivial case for
"lax" mode when accessing a jsonb object (not a jsonb array) using a
JSON array accessor "[0]". For example:
"lax" mode when accessing a jsonb object (not a jsonb array) using a
JSON array accessor "[0]". For example:
-- another setup, still use your data
truncate test_jsonb_types;
insert into test_jsonb_types VALUES ('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}');
test=# select (data).con[0] from test_jsonb_types;
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
This is equivalent to:
test=# select json_query(data, 'lax $.con[0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
which is different from the result if we use "strict" mode:
test=# select json_query(data, 'strict $.con[0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
------------
(1 row)
json_query
------------
(1 row)
According to SQL:2023:
— If an operation requires an SQL/JSON array but the operand is not an SQL/JSON array, then the operand is first “wrapped” in an SQL/JSON array prior to performing the operation.
— If an operation requires something other than an SQL/JSON array, but the operand is an SQL/JSON array, then the operand is “unwrapped” by converting its elements into an SQL/JSON sequence prior to performing the operation.
— After applying the preceding resolutions to structural errors, if there is still a structural error , the result is an empty SQL/JSON sequence.
Because "lax" mode wraps a jsonb object into an array of a single
element, accessing it with [0] will always return the same jsonb
object. In fact, you can access it with a chain of [0]s and still get
the same jsonb object:
test=# select json_query(data, 'lax $.con[0][0][0][0][0][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
test=# select (data).con[0][0][0][0][0][0] from test_jsonb_types;
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
test=# select (data).con[0][0][0][0][0][0] from test_jsonb_types;
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
I hope this long explanation helps!
Best,
Alex
Hi Chao,
On Thu, Aug 28, 2025 at 8:42 PM Chao Li <li.evan.chao@gmail.com> wrote:
I am trying to split different topics to different email to keep every issue to be focused.
Sure!
On Thu, Aug 28, 2025 at 8:42 PM Chao Li <li.evan.chao@gmail.com> wrote:
I also have a suggestion.If I do:```— s1select (t.data)['con']['a'][1]['b']['c']['d'] from test_jsonb_types t;—s2select (t.data).con.a[1].b['c'].d from test_jsonb_types t;```The two statements are actually identical. But they generate quite different rewritten query trees. S1’s rewritten tree is much simpler than s2’s. However, their plan trees are the same.
The above two statements are NOT identical. Specifically, dot-notation
(e.g., .con) and pre-standard jsonb subscripting (e.g., ['con']) are
NOT semantically the same.
(e.g., .con) and pre-standard jsonb subscripting (e.g., ['con']) are
NOT semantically the same.
Here's an example:
-- setup
create table t (jb jsonb);
insert into t SELECT '{"con": 1}'::jsonb;
insert into t SELECT '[{"con": 1}, {"con": {"a": 2}}]'::jsonb;
-- queries
test=# select (t.jb).con from t;
con
---------------
1
[1, {"a": 2}]
(2 rows)
con
---------------
1
[1, {"a": 2}]
(2 rows)
test=# select (t.jb)['con'] from t;
jb
----
1
(2 rows)
jb
----
1
(2 rows)
As you can see, dot-notation returns different results from jsonb
subscripting.
subscripting.
As I mentioned in the previous reply:
The SQL standard states that simplified access is equivalent to:
JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR
)
where:
VEP = <value expression primary>
JC = <JSON simplified accessor op chain>
And
In lax mode:
— If an operation requires an SQL/JSON array but the operand is not an SQL/JSON array, then the operand is first “wrapped” in an SQL/JSON array prior to performing the operation.
— If an operation requires something other than an SQL/JSON array, but the operand is an SQL/JSON array, then the operand is “unwrapped” by converting its elements into an SQL/JSON sequence prior to performing the operation.
— After applying the preceding resolutions to structural errors, if there is still a structural error , the result is an empty SQL/JSON sequence.
The example query demonstrates the second point above. The
dot-notation attempts to access a member field (."con") of a JSON
object, while the operand is a JSON array ([{"con": 1}, {"con": {"a":
2}}]). In "lax" mode, the operand is "unwrapped" into a JSON sequence
(two elements: {"con": 1} and {"con": {"a": 2}}), and the member field
access is performed on each element. The multiple results are then
wrapped into a JSON array ([1, {"a": 2}]) due to WITH CONDITIONAL
ARRAY WRAPPER. I’ve already explained what "ARRAY WRAPPER" does in my
previous reply, so I won't repeat it here.
dot-notation attempts to access a member field (."con") of a JSON
object, while the operand is a JSON array ([{"con": 1}, {"con": {"a":
2}}]). In "lax" mode, the operand is "unwrapped" into a JSON sequence
(two elements: {"con": 1} and {"con": {"a": 2}}), and the member field
access is performed on each element. The multiple results are then
wrapped into a JSON array ([1, {"a": 2}]) due to WITH CONDITIONAL
ARRAY WRAPPER. I’ve already explained what "ARRAY WRAPPER" does in my
previous reply, so I won't repeat it here.
Best,
Alex
Hi Alex,
Thank you so much for such a detailed explanation.
Regards,
On Aug 30, 2025, at 08:53, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:I hope this long explanation helps!
I used to only use the “->” and “->>” accessors for jsonb statements. After some research, I think my misunderstanding was about the slice operator [x:y]. I thought it could return a new array, however it actually returns a sequence.
I revisited the patch diffs again, and got a few other comments. Let me raise them in a separate email.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Aug 27, 2025, at 15:26, Chao Li <li.evan.chao@gmail.com> wrote:diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.hindex f7d07c84542..58a4b9df157 100644--- a/src/include/parser/parse_node.h+++ b/src/include/parser/parse_node.h@@ -361,7 +361,7 @@ extern SubscriptingRef *transformContainerSubscripts(ParseState *pstate,Node *containerBase,Oid containerType,int32 containerTypMod,- List *indirection,+ List **indirection,bool isAssignment);Here we change “indirection” from * to **, I guess the reason is to indicate if the indirection list is fully processed. In case of fully processed, set it to NULL, then caller gets NULL via **.As “indirection” is of type “List *”, can we just set “indirection->length = 0”? I checked pg_list.h, there is not a function or macro to cleanup a list (free elements and reset length to 0). There is a “list_free(List *list)”, but it will free “list” itself as well. Maybe we can add a new function, say “list_reset(List *list)” or “list_cleanup(List *list)”. This way will keep the function interface unchanged, and less changes would be involved.
After revisiting the changes, I take back this comment. Given the nature that List are always used in pointer form, my comment would just make things much more complicated.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Aug 29, 2025, at 15:22, Chao Li <li.evan.chao@gmail.com> wrote:+static bool+jsonb_check_jsonpath_needed(List *indirection)+{+ ListCell *lc;++ foreach(lc, indirection)+ {+ Node *accessor = lfirst(lc);++ if (IsA(accessor, String))+ return true;Like I mentioned in my previous comment, if we convert String (dot notation) to Indices in rewritten phase, then jsonpath might be no longer needed, and the code logic is much simplified.
Taking back this comment. As Alex explained in the other email, dot notation and indices accessor are different.
--- a/src/backend/parser/parse_node.c+++ b/src/backend/parser/parse_node.c@@ -244,7 +244,7 @@ transformContainerSubscripts(ParseState *pstate,Node *containerBase,Oid containerType,int32 containerTypMod,- List *indirection,+ List **indirection,bool isAssignment){SubscriptingRef *sbsref;@@ -280,7 +280,7 @@ transformContainerSubscripts(ParseState *pstate,* element. If any of the items are slice specifiers (lower:upper), then* the subscript expression means a container slice operation.*/- foreach(idx, indirection)+ foreach(idx, *indirection){A_Indices *ai = lfirst_node(A_Indices, idx);Should this foreach be pull up to out of transformContainerSubscripts()? Originally transformContainerSubscripts() runs only once, but now it’s placed in a while loop, and every time this foreach needs to go through the entire indirection list, which are duplicated computation.
After revisiting the code, I think this loop of checking is_slice should be removed here.
It seems only arrsubs cares about this is_slice flag.
hstore_subs can only take a single indirection, so it can do a simple check like:
```
ai = initial_node(A_Indices, *indirection);
If (list_length(*indirection) != 1 || ai->is_slice)
{
ereport(…)
}
```jsonbsubs doesn’t need to check is_slice flag as well, I will explain that in my next email tougher with my new comments.
So that, “bool isSlace” can be removed from SubscriptTransform, and let every individual subs check is_slice.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Sep 2, 2025, at 10:59, Chao Li <li.evan.chao@gmail.com> wrote:jsonbsubs doesn’t need to check is_slice flag as well, I will explain that in my next email tougher with my new comments.
Now jsonb_check_json_needed() loops over all indirections:
static bool
jsonb_check_jsonpath_needed(List *indirection)
{
ListCell *lc;
foreach(lc, indirection)
{
Node *accessor = lfirst(lc);
if (IsA(accessor, String))
return true;
else
{
Assert(IsA(accessor, A_Indices));
if (castNode(A_Indices, accessor)->is_slice)
return true;
}
}
return false;
}
Let’s consider this statement: select data[‘a’][‘b’].c from jsonb_table.
For this indirection list, first two elements are non-slice indices, and the third is a string accessor. So json_check_jsonpath_needed() will return true.
Then:
static void
jsonb_subscript_transform(SubscriptingRef *sbsref,
List **indirection,
ParseState *pstate,
bool isSlice,
bool isAssignment)
{
List *upperIndexpr = NIL;
ListCell *idx;
/* Determine the result type of the subscripting operation; always jsonb */
sbsref->refrestype = JSONBOID;
sbsref->reftypmod = -1;
if (isSlice || jsonb_check_jsonpath_needed(*indirection))
{
jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
if (sbsref->refjsonbpath)
return;
}
It will fall into the call to json_subscript_make_jsonpath(), and it cannot process [‘a’], so sbsref->refjsonbpath will be NULL, then it will fall through down to the logic of processing [‘a’].
That’s actually a waste of looping over the entire indirection list and making an unnecessary call to jsonb_subscript_make_jsonpath().
In json_check_jsonpath_needed(), we can only check the first item. Also, as jsonb_check_jsonpath_needed() already has the logic of checking is_slice, here in jsonb_subscript_transform(), we don’t need to check “isSlice” in the “if” condition at all.
I made the change locally, and “make check” passed.
My dirty change is like:
```
@@ -118,11 +118,11 @@ coerce_jsonpath_subscript_to_int4_or_text(ParseState *pstate, Node *subExpr)
static bool
jsonb_check_jsonpath_needed(List *indirection)
{
- ListCell *lc;
+ //ListCell *lc;
- foreach(lc, indirection)
- {
- Node *accessor = lfirst(lc);
+ //foreach(lc, indirection)
+ //{
+ Node *accessor = lfirst(list_head(indirection));
if (IsA(accessor, String) || IsA(accessor, A_Star))
return true;
@@ -133,7 +133,8 @@ jsonb_check_jsonpath_needed(List *indirection)
if (castNode(A_Indices, accessor)->is_slice)
return true;
}
- }
+ //break;
+ //}
return false;
}
@@ -401,7 +435,7 @@ jsonb_subscript_transform(SubscriptingRef *sbsref,
sbsref->refrestype = JSONBOID;
sbsref->reftypmod = -1;
- if (isSlice || jsonb_check_jsonpath_needed(*indirection))
+ if (jsonb_check_jsonpath_needed(*indirection))
{
jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
if (sbsref->refjsonbpath)
```
The other comment is that jsonb_subscript_make_jsonpath() can be optimized a little bit. When indices and dot notations are mixed, it will always hit the clause “if (api_from == NULL)”. In this case, memory of jpi has been allocated, and with the “if” we need to free the memory. So, we can pull up calculating jpi_from, if it is NULL, then we don’t need to allocate and free memory. My dirty change is like:
```
else if (IsA(accessor, A_Indices))
{
+ JsonPathParseItem *jpi_from = NULL;
A_Indices *ai = castNode(A_Indices, accessor);
+ if (!ai->is_slice)
+ {
+ Assert(ai->uidx && !ai->lidx);
+ jpi_from = make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true);
+ if (jpi_from == NULL)
+ {
+ /*
+ * Break out of the loop if the subscript is not a
+ * non-null integer constant, so that we can fall back to
+ * jsonb subscripting logic.
+ *
+ * This is needed to handle cases with mixed usage of SQL
+ * standard json simplified accessor syntax and PostgreSQL
+ * jsonb subscripting syntax, e.g:
+ *
+ * select (jb).a['b'].c from jsonb_table;
+ *
+ * where dot-notation (.a and .c) is the SQL standard json
+ * simplified accessor syntax, and the ['b'] subscript is
+ * the PostgreSQL jsonb subscripting syntax, because 'b'
+ * is not a non-null constant integer and cannot be used
+ * for json array access.
+ *
+ * In this case, we cannot create a JsonPath item, so we
+ * break out of the loop and let
+ * jsonb_subscript_transform() handle this indirection as
+ * a PostgreSQL jsonb subscript.
+ */
+ break;
+ }
+ }
+
jpi = make_jsonpath_item(jpiIndexArray);
jpi->value.array.nelems = 1;
jpi->value.array.elems = palloc(sizeof(jpi->value.array.elems[0]));
@@ -303,12 +337,12 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
}
else
{
- JsonPathParseItem *jpi_from = NULL;
+ //JsonPathParseItem *jpi_from = NULL;
- Assert(ai->uidx && !ai->lidx);
- jpi_from = make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true);
- if (jpi_from == NULL)
- {
+ //Assert(ai->uidx && !ai->lidx);
+ //jpi_from = make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true);
+ //if (jpi_from == NULL)
+ //{
/*
* Break out of the loop if the subscript is not a
* non-null integer constant, so that we can fall back to
@@ -331,10 +365,10 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
* jsonb_subscript_transform() handle this indirection as
* a PostgreSQL jsonb subscript.
*/
- pfree(jpi->value.array.elems);
- pfree(jpi);
- break;
- }
+ // pfree(jpi->value.array.elems);
+ // pfree(jpi);
+ // break;
+ //}
jpi->value.array.elems[0].from = jpi_from;
jpi->value.array.elems[0].to = NULL;
```
With this change, “make check” also passed.
The third comment is about test cases. I only see tests for indices following dot notation, like data.a[‘b’], but not the reverse. Let’s add a few test cases for dot notation following indices, like data[‘a’].b.
The last comment is about error message:
```
evantest=# select data.a from test_jsonb_types;
ERROR: missing FROM-clause entry for table "data"
LINE 1: select data.a from test_jsonb_types;
```
“Missing FROM-clause entry” is quite confusing and not providing much useful hint to resolve the problem. When I first saw this error, I was stuck. Only after read through the commit comments, I figured out the problem. For end users, we should provide some more meaningful error messages.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Monday, September 1, 2025, Chao Li <li.evan.chao@gmail.com> wrote:
The last comment is about error message:```evantest=# select data.a from test_jsonb_types;ERROR: missing FROM-clause entry for table "data"LINE 1: select data.a from test_jsonb_types;```“Missing FROM-clause entry” is quite confusing and not providing much useful hint to resolve the problem. When I first saw this error, I was stuck. Only after read through the commit comments, I figured out the problem. For end users, we should provide some more meaningful error messages.
I don’t think it’s fair to blame this patch set for this. If you want to reference a component of a composite column you need to write ([tbl.]col).field otherwise the parser gets confused because it wants the thing prior to “field” to be a table name if you don’t use the parentheses. This comes up all the time if one uses composite typed columns. I’ll agree this isn’t the greatest error message but I’d suggest starting a new thread if you want to see about improving it.
David J.
On Sep 2, 2025, at 11:53, David G. Johnston <david.g.johnston@gmail.com> wrote:On Monday, September 1, 2025, Chao Li <li.evan.chao@gmail.com> wrote:The last comment is about error message:```evantest=# select data.a from test_jsonb_types;ERROR: missing FROM-clause entry for table "data"LINE 1: select data.a from test_jsonb_types;```“Missing FROM-clause entry” is quite confusing and not providing much useful hint to resolve the problem. When I first saw this error, I was stuck. Only after read through the commit comments, I figured out the problem. For end users, we should provide some more meaningful error messages.I don’t think it’s fair to blame this patch set for this. If you want to reference a component of a composite column you need to write ([tbl.]col).field otherwise the parser gets confused because it wants the thing prior to “field” to be a table name if you don’t use the parentheses. This comes up all the time if one uses composite typed columns. I’ll agree this isn’t the greatest error message but I’d suggest starting a new thread if you want to see about improving it.David J.
Sure, we can discuss the topic separately.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
Hi Chao,
Thanks for reviewing! I've attached v15, which addresses your
feedback.
feedback.
I didn't make all the changes you suggested, please see my individual
comments below and in the next email.
comments below and in the next email.
On Mon, Sep 1, 2025 at 7:59 PM Chao Li <li.evan.chao@gmail.com> wrote:
--- a/src/backend/parser/parse_node.c+++ b/src/backend/parser/parse_node.c@@ -244,7 +244,7 @@ transformContainerSubscripts(ParseState *pstate,Node *containerBase,Oid containerType,int32 containerTypMod,- List *indirection,+ List **indirection,bool isAssignment){SubscriptingRef *sbsref;@@ -280,7 +280,7 @@ transformContainerSubscripts(ParseState *pstate,* element. If any of the items are slice specifiers (lower:upper), then* the subscript expression means a container slice operation.*/- foreach(idx, indirection)+ foreach(idx, *indirection){A_Indices *ai = lfirst_node(A_Indices, idx);Should this foreach be pull up to out of transformContainerSubscripts()? Originally transformContainerSubscripts() runs only once, but now it’s placed in a while loop, and every time this foreach needs to go through the entire indirection list, which are duplicated computation.After revisiting the code, I think this loop of checking is_slice should be removed here.It seems only arrsubs cares about this is_slice flag.hstore_subs can only take a single indirection, so it can do a simple check like:```ai = initial_node(A_Indices, *indirection);If (list_length(*indirection) != 1 || ai->is_slice){ereport(…)}```jsonbsubs doesn’t need to check is_slice flag as well, I will explain that in my next email tougher with my new comments.So that, “bool isSlace” can be removed from SubscriptTransform, and let every individual subs check is_slice.
I don't like the idea of removing the "bool isSlice" argument from the
*SubscriptTransform() function pointer. This patch series should make
only minimal changes to the subscripting API. While it's true that
each implementation can easily determine the boolean value of isSlice
itself, I don't see a clear benefit to changing the interface. As you
noted, arrsubs cares about isSlice; and users can also define custom
data types that support subscripting, so the interface is not limited
to built-in types.
*SubscriptTransform() function pointer. This patch series should make
only minimal changes to the subscripting API. While it's true that
each implementation can easily determine the boolean value of isSlice
itself, I don't see a clear benefit to changing the interface. As you
noted, arrsubs cares about isSlice; and users can also define custom
data types that support subscripting, so the interface is not limited
to built-in types.
As the comment for *SubscriptTransform() explains:
(Of course, if the transform method
* does not care to support slicing, it can just throw an error if isSlice.)
It's possible that in the end the "isSlice" argument isn't needed at
all, but I don’t think this patch set is the right place for that
refactor.
all, but I don’t think this patch set is the right place for that
refactor.
Best,
Alex
Вложения
- v15-0007-Implement-jsonb-wildcard-member-accessor.patch
- v15-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v15-0004-Extract-coerce_jsonpath_subscript.patch
- v15-0006-Implement-Jsonb-subscripting-with-slicing.patch
- v15-0005-Implement-read-only-dot-notation-for-jsonb.patch
- v15-0003-Export-jsonPathFromParseResult.patch
- v15-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
Hi Chao,
Continue from my previous email.
On Mon, Sep 1, 2025 at 8:32 PM Chao Li <li.evan.chao@gmail.com> wrote:
On Sep 2, 2025, at 10:59, Chao Li <li.evan.chao@gmail.com> wrote:jsonbsubs doesn’t need to check is_slice flag as well, I will explain that in my next email tougher with my new comments.Now jsonb_check_json_needed() loops over all indirections:static booljsonb_check_jsonpath_needed(List *indirection){ListCell *lc;foreach(lc, indirection){Node *accessor = lfirst(lc);if (IsA(accessor, String))return true;else{Assert(IsA(accessor, A_Indices));if (castNode(A_Indices, accessor)->is_slice)return true;}}return false;}Let’s consider this statement: select data[‘a’][‘b’].c from jsonb_table.For this indirection list, first two elements are non-slice indices, and the third is a string accessor. So json_check_jsonpath_needed() will return true.Then:static voidjsonb_subscript_transform(SubscriptingRef *sbsref,List **indirection,ParseState *pstate,bool isSlice,bool isAssignment){List *upperIndexpr = NIL;ListCell *idx;/* Determine the result type of the subscripting operation; always jsonb */sbsref->refrestype = JSONBOID;sbsref->reftypmod = -1;if (isSlice || jsonb_check_jsonpath_needed(*indirection)){jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);if (sbsref->refjsonbpath)return;}It will fall into the call to json_subscript_make_jsonpath(), and it cannot process [‘a’], so sbsref->refjsonbpath will be NULL, then it will fall through down to the logic of processing [‘a’].That’s actually a waste of looping over the entire indirection list and making an unnecessary call to jsonb_subscript_make_jsonpath().In json_check_jsonpath_needed(), we can only check the first item. Also, as jsonb_check_jsonpath_needed() already has the logic of checking is_slice, here in jsonb_subscript_transform(), we don’t need to check “isSlice” in the “if” condition at all.I made the change locally, and “make check” passed.My dirty change is like:```@@ -118,11 +118,11 @@ coerce_jsonpath_subscript_to_int4_or_text(ParseState *pstate, Node *subExpr)static booljsonb_check_jsonpath_needed(List *indirection){- ListCell *lc;+ //ListCell *lc;- foreach(lc, indirection)- {- Node *accessor = lfirst(lc);+ //foreach(lc, indirection)+ //{+ Node *accessor = lfirst(list_head(indirection));if (IsA(accessor, String) || IsA(accessor, A_Star))return true;@@ -133,7 +133,8 @@ jsonb_check_jsonpath_needed(List *indirection)if (castNode(A_Indices, accessor)->is_slice)return true;}- }+ //break;+ //}return false;}@@ -401,7 +435,7 @@ jsonb_subscript_transform(SubscriptingRef *sbsref,sbsref->refrestype = JSONBOID;sbsref->reftypmod = -1;- if (isSlice || jsonb_check_jsonpath_needed(*indirection))+ if (jsonb_check_jsonpath_needed(*indirection)){jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);if (sbsref->refjsonbpath)```
This change would give an incorrect result for an accessor like
"[0].a" when the jsonb column data is a jsonb object instead of a
jsonb array. I've added two new test cases to cover this scenario:
"[0].a" when the jsonb column data is a jsonb object instead of a
jsonb array. I've added two new test cases to cover this scenario:
In jsonb.sql, the newly added tests are:
select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as (jb).aselect (jb)[1].a from test_jsonb_dot_notation; -- returns NULL
In v5, I included isSlice as an argument of
jsonb_check_jsonpath_needed(), so we no longer need to check the
is_slice field of each indirection element. That should make the check
more efficient. In jsonb_subscript_make_jsonpath(), I also moved the
check for cases like ['a'] earlier, as you suggested. These changes
should eliminate some unnecessary comparisons and allocate then
immediately free.
jsonb_check_jsonpath_needed(), so we no longer need to check the
is_slice field of each indirection element. That should make the check
more efficient. In jsonb_subscript_make_jsonpath(), I also moved the
check for cases like ['a'] earlier, as you suggested. These changes
should eliminate some unnecessary comparisons and allocate then
immediately free.
I also want to point out that we should not recommend users mix the
standard simplified accessor syntax with the pre-standard jsonb
subscripting, because:
standard simplified accessor syntax with the pre-standard jsonb
subscripting, because:
1. It is confusing. I would assume that a user who uses dot-notation
understands its definition. As we discussed earlier, (jb).a and
(jb)['a'] can return different results: the former is in "lax" mode
and with a conditional array wrapper, while the latter has neither. If
a user queries something like (jb)['a'].b['c'], I think most likely
she wants either (jb)['a']['b']['c'] or (jb).a.b.c.
understands its definition. As we discussed earlier, (jb).a and
(jb)['a'] can return different results: the former is in "lax" mode
and with a conditional array wrapper, while the latter has neither. If
a user queries something like (jb)['a'].b['c'], I think most likely
she wants either (jb)['a']['b']['c'] or (jb).a.b.c.
2. As you said, it hurts performance. When a pre-standard non-integer
subscript appears before a dot-notation, we bail out of creating a
jsonpath. Alternatively, if a user really wants (jb)['a'].b['c'], it
could be explicitly written as (((jb)['a']).b)['c']. This avoids
unnecessary looping and bailout.
subscript appears before a dot-notation, we bail out of creating a
jsonpath. Alternatively, if a user really wants (jb)['a'].b['c'], it
could be explicitly written as (((jb)['a']).b)['c']. This avoids
unnecessary looping and bailout.
In my earlier revisions (up to v12), I actually emitted warnings when
users mixed syntaxes. Starting from v13, I removed the warning based
on Jian’s feedback. Here's the context:
users mixed syntaxes. Starting from v13, I removed the warning based
on Jian’s feedback. Here's the context:
On Wed, Aug 20, 2025 at 9:53 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote:after applied v12-0001 to v12-0006
+ /* emit warning conditionally to minimize duplicate warnings */
+ if (list_length(*indirection) > 0)
+ ereport(WARNING,
+ errcode(ERRCODE_WARNING),
+ errmsg("mixed usage of jsonb simplified accessor syntax and jsonb
subscripting."),
+ errhint("use dot-notation for member access, or use non-null integer
constants subscripting for array access."),
+ parser_errposition(pstate, warning_location));
src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8];
WARNING: mixed usage of jsonb simplified accessor syntax and jsonb
subscripting.
LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
^
HINT: use dot-notation for member access, or use non-null integer
constants subscripting for array access.
ERROR: subscript type bigint is not supported
LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
^
HINT: jsonb subscript must be coercible to either integer or text.
The above example looks very bad. location printed twice, hint message
is different.
two messages level (ERROR, WARNING).
... omitting some irelavent comments...
I see your confusion about the WARNING/ERROR messages. My intent was
to encourage users to use consistent syntax flavors rather than mixing
the SQL standard JSON simplified accessor with the pre-standard
PostgreSQL jsonb subscripting. In the meantime, I want to support
mixed syntax flavors as much as possible. However, your feedback makes
it clear that users don’t need that level of details. So, in v13 I’ve
removed the WARNING message and instead added a comment explaining how
we support mixed syntaxes.Here's the relevant diff between v12 and v13, hope it helps:--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -247,7 +247,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
ListCell *lc;
Datum jsp;
int pathlen = 0;
- int warning_location = -1;
sbsref->refupperindexpr = NIL;
sbsref->reflowerindexpr = NIL;
@@ -311,10 +310,27 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
if (jpi_from == NULL)
{
/*
- * postpone emitting the warning until the end of this
- * function
+ * Break out of the loop if the subscript is not a
+ * non-null integer constant, so that we can fall back to
+ * jsonb subscripting logic.
+ *
+ * This is needed to handle cases with mixed usage of SQL
+ * standard json simplified accessor syntax and PostgreSQL
+ * jsonb subscripting syntax, e.g:
+ *
+ * select (jb).a['b'].c from jsonb_table;
+ *
+ * where dot-notation (.a and .c) is the SQL standard json
+ * simplified accessor syntax, and the ['b'] subscript is
+ * the PostgreSQL jsonb subscripting syntax, because 'b'
+ * is not a non-null constant integer and cannot be used
+ * for json array access.
+ *
+ * In this case, we cannot create a JsonPath item, so we
+ * break out of the loop and let
+ * jsonb_subscript_transform() handle this indirection as
+ * a PostgreSQL jsonb subscript.
*/
- warning_location = exprLocation(ai->uidx);
pfree(jpi->value.array.elems);
pfree(jpi);
break;
@@ -360,14 +376,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
*indirection = list_delete_first_n(*indirection, pathlen);
- /* emit warning conditionally to minimize duplicate warnings */
- if (list_length(*indirection) > 0)
- ereport(WARNING,
- errcode(ERRCODE_WARNING),
- errmsg("mixed usage of jsonb simplified accessor syntax and jsonb subscripting."),
- errhint("use dot-notation for member access, or use non-null integer constants subscripting for array access."),
- parser_errposition(pstate, warning_location));
-
jsp = jsonPathFromParseResult(&jpres, 0, NULL);
I think we should definitely add detailed documentation about mixed
syntax. However, I'm not convinced it's worth putting more effort into
optimizing performance for those cases. Alternatively, we could fail
outright in those cases, but I don't think that would be the best user
experience. Thoughts?
syntax. However, I'm not convinced it's worth putting more effort into
optimizing performance for those cases. Alternatively, we could fail
outright in those cases, but I don't think that would be the best user
experience. Thoughts?
On Mon, Sep 1, 2025 at 8:32 PM Chao Li <li.evan.chao@gmail.com> wrote:
The other comment is that jsonb_subscript_make_jsonpath() can be optimized a little bit. When indices and dot notations are mixed, it will always hit the clause “if (api_from == NULL)”. In this case, memory of jpi has been allocated, and with the “if” we need to free the memory. So, we can pull up calculating jpi_from, if it is NULL, then we don’t need to allocate and free memory. My dirty change is like:```else if (IsA(accessor, A_Indices)){+ JsonPathParseItem *jpi_from = NULL;A_Indices *ai = castNode(A_Indices, accessor);+ if (!ai->is_slice)+ {+ Assert(ai->uidx && !ai->lidx);+ jpi_from = make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true);+ if (jpi_from == NULL)+ {+ /*+ * Break out of the loop if the subscript is not a+ * non-null integer constant, so that we can fall back to+ * jsonb subscripting logic.+ *+ * This is needed to handle cases with mixed usage of SQL+ * standard json simplified accessor syntax and PostgreSQL+ * jsonb subscripting syntax, e.g:+ *+ * select (jb).a['b'].c from jsonb_table;+ *+ * where dot-notation (.a and .c) is the SQL standard json+ * simplified accessor syntax, and the ['b'] subscript is+ * the PostgreSQL jsonb subscripting syntax, because 'b'+ * is not a non-null constant integer and cannot be used+ * for json array access.+ *+ * In this case, we cannot create a JsonPath item, so we+ * break out of the loop and let+ * jsonb_subscript_transform() handle this indirection as+ * a PostgreSQL jsonb subscript.+ */+ break;+ }+ }+jpi = make_jsonpath_item(jpiIndexArray);jpi->value.array.nelems = 1;jpi->value.array.elems = palloc(sizeof(jpi->value.array.elems[0]));@@ -303,12 +337,12 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti}else{- JsonPathParseItem *jpi_from = NULL;+ //JsonPathParseItem *jpi_from = NULL;- Assert(ai->uidx && !ai->lidx);- jpi_from = make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true);- if (jpi_from == NULL)- {+ //Assert(ai->uidx && !ai->lidx);+ //jpi_from = make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true);+ //if (jpi_from == NULL)+ //{/** Break out of the loop if the subscript is not a* non-null integer constant, so that we can fall back to@@ -331,10 +365,10 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti* jsonb_subscript_transform() handle this indirection as* a PostgreSQL jsonb subscript.*/- pfree(jpi->value.array.elems);- pfree(jpi);- break;- }+ // pfree(jpi->value.array.elems);+ // pfree(jpi);+ // break;+ //}jpi->value.array.elems[0].from = jpi_from;jpi->value.array.elems[0].to = NULL;```With this change, “make check” also passed.
Fixed. Thank you!
On Mon, Sep 1, 2025 at 8:32 PM Chao Li <li.evan.chao@gmail.com> wrote:
The third comment is about test cases. I only see tests for indices following dot notation, like data.a[‘b’], but not the reverse. Let’s add a few test cases for dot notation following indices, like data[‘a’].b.
I believe both cases are already covered. Please take a look at
src/test/regress/jsonb.sql; there's a "-- mixed syntax" section.
src/test/regress/jsonb.sql; there's a "-- mixed syntax" section.
Best,
Alex
Hi Alex,
Thanks for addressing my comments. I have a few follow up comments:
On Sep 3, 2025, at 10:16, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:I don't like the idea of removing the "bool isSlice" argument from the
*SubscriptTransform() function pointer. This patch series should make
only minimal changes to the subscripting API. While it's true that
each implementation can easily determine the boolean value of isSlice
itself, I don't see a clear benefit to changing the interface. As you
noted, arrsubs cares about isSlice; and users can also define custom
data types that support subscripting, so the interface is not limited
to built-in types.As the comment for *SubscriptTransform() explains:(Of course, if the transform method
* does not care to support slicing, it can just throw an error if isSlice.)It's possible that in the end the "isSlice" argument isn't needed at
all, but I don’t think this patch set is the right place for that
refactor.
I agree we should keep the change minimum and tightly related to the core feature.
My original suggestion was to move
/*
* Detect whether any of the indirection items are slice specifiers.
*
* A list containing only simple subscripts refers to a single container
* element. If any of the items are slice specifiers (lower:upper), then
* the subscript expression means a container slice operation.
*/
foreach(idx, *indirection)
{
Node *ai = lfirst(idx);
if (IsA(ai, A_Indices) && castNode(A_Indices, ai)->is_slice)
{
isSlice = true;
break;
}
}
To out of transformContainerSubscripts(). Because the function was called only once under “if”, now this patch change it to be called under “while”, which brings two issues:
* Original it was O(n) as it was under “if”, now it is O(n2) as it is under “while”.
* Logically it feels wrong now. Because this loops over the entire indirection list to check is_slice, while the subsequent sbsroutines->transform() may only process a portion (prefix) of indirection list. Say, the 5th element is slice, but the first sbsroutines-transform() call will only process the first 3 elements of indirection list, then pass true to the first transform() call sounds wrong.
if we pull the loop out of transformContainerSubscripts(), then we have to add a new parameter “bool is_slice” to it. But after some researching, I found that making that change is more complicated than removing “is_slice” parameter from SubscriptTransform(). That’s why I ended up suggesting removing “is_slice” from SubscriptTransform().
Does that sound reasonable?
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Sep 3, 2025, at 10:20, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:This change would give an incorrect result for an accessor like
"[0].a" when the jsonb column data is a jsonb object instead of a
jsonb array. I've added two new test cases to cover this scenario:In jsonb.sql, the newly added tests are:select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as (jb).a
select (jb)[1].a from test_jsonb_dot_notation; -- returns NULL
I think the latest patch is still wrong. Ultimately, dot notation “.a", index “[0] and slice "[1:4]” rely on jsonpath, and subscript [“a”] relies on the rest logic in jsonb_subscript_transform() in “foreach”.
Now “jsonb_check_jsonpath_needed()” checks only dot nation and slice, but not checking index case. So that reason why your case “select (jb)[0].a” works is because the second indirection is a dot nation. However, “select (jb)[0][‘a’]” will not work. See my test:
```
evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name;
name
---------
"Alice"
(1 row)
evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name'];
jsonb
-------
(1 row)
```
In my test, (jsonb)[0][’name’] should also return “Alice”.
So, end up, “jsonb_check_jsonpath_needed()” only does incomplete and inaccurate checks, only jsonb_subscript_make_jsonpath() can make an accurate decision and return a null jsonpath upon subscript “[‘a’]”.
As a result, “json_check_jsonpath_needed()” feels not needed at all. In jsonb_subscript_transform(), just go ahead to run jsonb_subscript_make_jsonpath() first, if returned jsonpath is NULL, then run rest of logic.
With my dirty change of removing json_check_jsonpath_needed:
```
chaol@ChaodeMacBook-Air postgresql % git diff
diff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.c
index 374040b3b4e..d9faab5c799 100644
--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -416,12 +416,12 @@ jsonb_subscript_transform(SubscriptingRef *sbsref,
sbsref->refrestype = JSONBOID;
sbsref->reftypmod = -1;
- if (jsonb_check_jsonpath_needed(*indirection, isSlice))
- {
+ //if (jsonb_check_jsonpath_needed(*indirection, isSlice))
+ //{
jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
if (sbsref->refjsonbpath)
return;
- }
+ //}
/*
* We reach here only in two cases: (a) the JSON simplified accessor is
```
You can see:
```
evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name'];
jsonb
---------
"Alice"
(1 row)
evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name;
name
---------
"Alice"
(1 row)
```
Then I found “make check” failed. For example the first broken test case:
```
@@ -4998,7 +4998,7 @@
select ('123'::jsonb)[0];
jsonb
-------
-
+ 123
(1 row)
```
The test expected an empty result, which implies “strict” mode.
But the problem is, which mode should be the default? JSON_QUERY() uses “lax” as default mode. And from Peter Eisentraut’s blog: https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-whats-new
```
The semantics of this are defined in terms of
JSON_QUERY
and JSON_VALUE
constructions (which have been available since SQL:2016), so this really just syntactic sugar.```
Also feels like “lax” should be the default mode. If that is true, then my dirty change of removing “json_check_jsonpath_need()” works properly.
The current logic with this patch sounds strange. Because “json_check_jsonpath_need()” iterate through unprocessed indirections to decide if jsonpath is needed (lax mode). With this logic:
1) if index [0] directly following dot notation, like (data).a[0], it’s lax mode
2) if index [0] directly following subscript [‘a’], like (data)[‘a’][0], it’s strict mode
3) if index [0] directly following the data column, then if there is a dot nation in indirection list, use lax mode, otherwise strict mode. For the failed test case, as there is no more indirection following [0], so it expected strict mode.
I wonder where this behavior is defined?
With my change, 1) and 2) are the same, for 3), if index [0] directly following the data column, regardless what indirections are followed, it’s by default lax mode.
So, I think this is a design decision. Maybe I missed something from your previous design, but I don’t find anything about that from the commit comments. I feel this would be better aligned with 1) and 2).
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Sep 3, 2025, at 10:16, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:<v15-0007-Implement-jsonb-wildcard-member-accessor.patch><v15-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v15-0004-Extract-coerce_jsonpath_subscript.patch><v15-0006-Implement-Jsonb-subscripting-with-slicing.patch><v15-0005-Implement-read-only-dot-notation-for-jsonb.patch><v15-0003-Export-jsonPathFromParseResult.patch><v15-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch>
I have a few more other small comments:
1 - 0002
```
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 6e8fd42c612..ff104c95311 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -441,8 +441,9 @@ transformIndirection(ParseState *pstate, A_Indirection *ind)
ListCell *i;
/*
- * We have to split any field-selection operations apart from
- * subscripting. Adjacent A_Indices nodes have to be treated as a single
+ * Combine field names and subscripts into a single indirection list, as
+ * some subscripting containers, such as jsonb, support field access using
+ * dot notation. Adjacent A_Indices nodes have to be treated as a single
* multidimensional subscript operation.
*/
foreach(i, ind->indirection)
@@ -460,19 +461,43 @@ transformIndirection(ParseState *pstate, A_Indirection *ind)
}
else
{
- Node *newresult;
-
Assert(IsA(n, String));
+ subscripts = lappend(subscripts, n);
+ }
+ }
```
I raised this comment in my previous email, I guess you missed this one.
With this change, the 3 clauses are quite similar, the if-elseif-else can be simplified as:
If (!IsA(n, A_Indices) && !Is_A(n, A_Start))
Assert(IsA(n, String));
subscripts = lappend(subscripts, n)
2 - 0001
```
diff --git a/src/include/nodes/subscripting.h b/src/include/nodes/subscripting.h
index 234e8ad8012..5d576af346f 100644
--- a/src/include/nodes/subscripting.h
+++ b/src/include/nodes/subscripting.h
@@ -71,6 +71,11 @@ struct SubscriptExecSteps;
* does not care to support slicing, it can just throw an error if isSlice.)
* See array_subscript_transform() for sample code.
*
+ * The transform method receives a pointer to a list of raw indirections.
+ * This allows the method to parse a sublist of the indirections (typically
+ * the prefix) and modify the original list in place, enabling the caller to
+ * either process the remaining indirections differently or raise an error.
+ *
* The transform method is also responsible for identifying the result type
```
This is nit comment about the wording. “This allows the method to parse a sublist..." sounds like more from the patch perspective. It’s more suitable for git commit comments, but code comment, I feel it’s better to be more general, for example:
* The transform method receives a pointer to a list of raw indirections.
* It can parse a sublist (typically the prefix) of these indirections and
* modify the original list in place, allowing the caller to either handle
* the remaining indirections differently or raise an error.
* It can parse a sublist (typically the prefix) of these indirections and
* modify the original list in place, allowing the caller to either handle
* the remaining indirections differently or raise an error.
3 - 0005
```
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index f1569879b52..eac31b097e4 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3320,50 +3320,54 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref,
state->steps_len - 1);
}
+ /* When slicing, individual subscript bounds can be omitted */
+ if (!e)
+ {
+ sbsrefstate->upperprovided[i] = false;
+ sbsrefstate->upperindexnull[i] = true;
+ }
+ else {
+ sbsrefstate->upperprovided[i] = true;
```
This is also a nit comment about code format. “{" should be put to the next line of “else” according to other existing code.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
Hi Chao,
Thank you so much for your review!
I’ve attached v16.
My current goal is to get consensus on 0001 - 0005 and move them toward
commitment. I’d also appreciate feedback on 0006 - 0007 as follow-up.
In v16, I addressed your feedback regarding the isSlice argument in
SubscriptTransform() -- I removed it. I’ve also addressed your three
other comments, sorry for missing them earlier!
I did not change anything about the jb[0] array accessor for jsonb
objects, will discuss that in a follow-up email.
I’ve attached v16.
My current goal is to get consensus on 0001 - 0005 and move them toward
commitment. I’d also appreciate feedback on 0006 - 0007 as follow-up.
In v16, I addressed your feedback regarding the isSlice argument in
SubscriptTransform() -- I removed it. I’ve also addressed your three
other comments, sorry for missing them earlier!
I did not change anything about the jb[0] array accessor for jsonb
objects, will discuss that in a follow-up email.
On Tue, Sep 2, 2025 at 8:56 PM Chao Li <li.evan.chao@gmail.com> wrote:
On Sep 3, 2025, at 10:16, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:I don't like the idea of removing the "bool isSlice" argument from the
*SubscriptTransform() function pointer. This patch series should make
only minimal changes to the subscripting API. While it's true that
each implementation can easily determine the boolean value of isSlice
itself, I don't see a clear benefit to changing the interface. As you
noted, arrsubs cares about isSlice; and users can also define custom
data types that support subscripting, so the interface is not limited
to built-in types.As the comment for *SubscriptTransform() explains:(Of course, if the transform method
* does not care to support slicing, it can just throw an error if isSlice.)It's possible that in the end the "isSlice" argument isn't needed at
all, but I don’t think this patch set is the right place for that
refactor.I agree we should keep the change minimum and tightly related to the core feature.My original suggestion was to move/** Detect whether any of the indirection items are slice specifiers.** A list containing only simple subscripts refers to a single container* element. If any of the items are slice specifiers (lower:upper), then* the subscript expression means a container slice operation.*/foreach(idx, *indirection){Node *ai = lfirst(idx);if (IsA(ai, A_Indices) && castNode(A_Indices, ai)->is_slice){isSlice = true;break;}}To out of transformContainerSubscripts(). Because the function was called only once under “if”, now this patch change it to be called under “while”, which brings two issues:* Original it was O(n) as it was under “if”, now it is O(n2) as it is under “while”.* Logically it feels wrong now. Because this loops over the entire indirection list to check is_slice, while the subsequent sbsroutines->transform() may only process a portion (prefix) of indirection list. Say, the 5th element is slice, but the first sbsroutines-transform() call will only process the first 3 elements of indirection list, then pass true to the first transform() call sounds wrong.if we pull the loop out of transformContainerSubscripts(), then we have to add a new parameter “bool is_slice” to it. But after some researching, I found that making that change is more complicated than removing “is_slice” parameter from SubscriptTransform(). That’s why I ended up suggesting removing “is_slice” from SubscriptTransform().Does that sound reasonable?
Yes, that makes total sense! The logical defect you found can indeed
cause bugs in arraysubs. In 0002, I’ve added a test in arrays.sql to
demonstrate the potential bug and updated the code accordingly:
cause bugs in arraysubs. In 0002, I’ve added a test in arrays.sql to
demonstrate the potential bug and updated the code accordingly:
This change also updates the SubscriptTransform() API to remove the
"bool isSlice" argument. Each container’s transform function now
determines slice-ness itself, since it may only process a prefix of
the indirection list. For example, array_subscript_transform() stops
at dot notation, so "isSlice" should not depend on slice specifiers
that appear afterward.
"bool isSlice" argument. Each container’s transform function now
determines slice-ness itself, since it may only process a prefix of
the indirection list. For example, array_subscript_transform() stops
at dot notation, so "isSlice" should not depend on slice specifiers
that appear afterward.
Thanks,
Alex
Вложения
- v16-0007-Implement-jsonb-wildcard-member-accessor.patch
- v16-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v16-0004-Extract-coerce_jsonpath_subscript.patch
- v16-0006-Implement-Jsonb-subscripting-with-slicing.patch
- v16-0005-Implement-read-only-dot-notation-for-jsonb.patch
- v16-0003-Export-jsonPathFromParseResult.patch
- v16-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
Hi Chao,
On Tue, Sep 2, 2025 at 11:57 PM Chao Li <li.evan.chao@gmail.com> wrote:
On Sep 3, 2025, at 10:20, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:This change would give an incorrect result for an accessor like
"[0].a" when the jsonb column data is a jsonb object instead of a
jsonb array. I've added two new test cases to cover this scenario:In jsonb.sql, the newly added tests are:select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as (jb).a
select (jb)[1].a from test_jsonb_dot_notation; -- returns NULLI think the latest patch is still wrong. Ultimately, dot notation “.a", index “[0] and slice "[1:4]” rely on jsonpath, and subscript [“a”] relies on the rest logic in jsonb_subscript_transform() in “foreach”.Now “jsonb_check_jsonpath_needed()” checks only dot nation and slice, but not checking index case. So that reason why your case “select (jb)[0].a” works is because the second indirection is a dot nation. However, “select (jb)[0][‘a’]” will not work. See my test:```evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name;name---------"Alice"(1 row)evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name'];jsonb-------(1 row)```In my test, (jsonb)[0][’name’] should also return “Alice”.So, end up, “jsonb_check_jsonpath_needed()” only does incomplete and inaccurate checks, only jsonb_subscript_make_jsonpath() can make an accurate decision and return a null jsonpath upon subscript “[‘a’]”.As a result, “json_check_jsonpath_needed()” feels not needed at all. In jsonb_subscript_transform(), just go ahead to run jsonb_subscript_make_jsonpath() first, if returned jsonpath is NULL, then run rest of logic.With my dirty change of removing json_check_jsonpath_needed:```chaol@ChaodeMacBook-Air postgresql % git diffdiff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.cindex 374040b3b4e..d9faab5c799 100644--- a/src/backend/utils/adt/jsonbsubs.c+++ b/src/backend/utils/adt/jsonbsubs.c@@ -416,12 +416,12 @@ jsonb_subscript_transform(SubscriptingRef *sbsref,sbsref->refrestype = JSONBOID;sbsref->reftypmod = -1;- if (jsonb_check_jsonpath_needed(*indirection, isSlice))- {+ //if (jsonb_check_jsonpath_needed(*indirection, isSlice))+ //{jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);if (sbsref->refjsonbpath)return;- }+ //}/** We reach here only in two cases: (a) the JSON simplified accessor is```You can see:```evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name'];jsonb---------"Alice"(1 row)evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name;name---------"Alice"(1 row)```Then I found “make check” failed. For example the first broken test case:```@@ -4998,7 +4998,7 @@select ('123'::jsonb)[0];jsonb--------+ 123(1 row)```The test expected an empty result, which implies “strict” mode.But the problem is, which mode should be the default? JSON_QUERY() uses “lax” as default mode. And from Peter Eisentraut’s blog: https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-whats-new```The semantics of this are defined in terms ofJSON_QUERY
andJSON_VALUE
constructions (which have been available since SQL:2016), so this really just syntactic sugar.```Also feels like “lax” should be the default mode. If that is true, then my dirty change of removing “json_check_jsonpath_need()” works properly.The current logic with this patch sounds strange. Because “json_check_jsonpath_need()” iterate through unprocessed indirections to decide if jsonpath is needed (lax mode). With this logic:1) if index [0] directly following dot notation, like (data).a[0], it’s lax mode2) if index [0] directly following subscript [‘a’], like (data)[‘a’][0], it’s strict mode3) if index [0] directly following the data column, then if there is a dot nation in indirection list, use lax mode, otherwise strict mode. For the failed test case, as there is no more indirection following [0], so it expected strict mode.I wonder where this behavior is defined?With my change, 1) and 2) are the same, for 3), if index [0] directly following the data column, regardless what indirections are followed, it’s by default lax mode.So, I think this is a design decision. Maybe I missed something from your previous design, but I don’t find anything about that from the commit comments. I feel this would be better aligned with 1) and 2).
Thanks for investigating this, and for the great questions!
Here’s a bit of background. The pre-standard jsonb subscripting
feature [1] was introduced in Postgres 14. Specifically, support for
queries like (jb)[0] and (jb)[0]['a'] was added by this commit:
feature [1] was introduced in Postgres 14. Specifically, support for
queries like (jb)[0] and (jb)[0]['a'] was added by this commit:
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun Jan 31 23:50:40 2021 +0300
Implementation of subscripting for jsonb
Without my patch, from version 14 through the current master branch,
all supported versions of Postgres return the following results:
all supported versions of Postgres return the following results:
test=# select ('123'::jsonb)[0];
jsonb
-------
(1 row)
test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0];
jsonb
-------
(1 row)
test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['age'];
jsonb
-------
(1 row)
jsonb
-------
(1 row)
test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0];
jsonb
-------
(1 row)
test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['age'];
jsonb
-------
(1 row)
The simplified accessor syntax defined in the SQL standard includes
dot-notation access as well as integer-based subscripts. It does not
include text-based subscripts such as ['a']. Therefore, the only
overlap between the pre-standard jsonb subscripting and the standard
simplified accessor is non-slicing integer-based subscripts. (Since
the pre-standard jsonb subscripting never supported slicing, we don’t
need to consider that case either when comparing the two syntaxes.)
dot-notation access as well as integer-based subscripts. It does not
include text-based subscripts such as ['a']. Therefore, the only
overlap between the pre-standard jsonb subscripting and the standard
simplified accessor is non-slicing integer-based subscripts. (Since
the pre-standard jsonb subscripting never supported slicing, we don’t
need to consider that case either when comparing the two syntaxes.)
When we compare the two syntaxes for non-slicing integer-based
subscripts, the only discrepancy is that for a jsonb object jb,
(jb)[0] in the pre-standard syntax returns NULL, while (jb)[0] in the
standard syntax returns (jb) itself. As long as the integer subscript
is non-zero, both syntaxes return the same results.
subscripts, the only discrepancy is that for a jsonb object jb,
(jb)[0] in the pre-standard syntax returns NULL, while (jb)[0] in the
standard syntax returns (jb) itself. As long as the integer subscript
is non-zero, both syntaxes return the same results.
I think this (jb)[0] case is rather trivial. However, since it's been
behaving the pre-standard way since PG14, I hesitate to change the
existing behavior as part of my patches, and I feel it could be a
bikeshedding kind of discussion in a separate thread.
behaving the pre-standard way since PG14, I hesitate to change the
existing behavior as part of my patches, and I feel it could be a
bikeshedding kind of discussion in a separate thread.
The main goal of my patch is to add dot-notation. For now, my
intention is for cases like (jb)[0].a to work in the standard way,
because if a user chooses dot-notation instead of text-based
subscripting, they are likely expecting the standard behavior.
intention is for cases like (jb)[0].a to work in the standard way,
because if a user chooses dot-notation instead of text-based
subscripting, they are likely expecting the standard behavior.
Thoughts?
<v16-0007-Implement-jsonb-wildcard-member-accessor.patch><v16-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v16-0004-Extract-coerce_jsonpath_subscript.patch><v16-0006-Implement-Jsonb-subscripting-with-slicing.patch><v16-0005-Implement-read-only-dot-notation-for-jsonb.patch><v16-0003-Export-jsonPathFromParseResult.patch><v16-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch>
A few more comment for v16.
1 - 0002
```
--- a/contrib/hstore/hstore_subs.c
+++ b/contrib/hstore/hstore_subs.c
- if (isSlice || list_length(*indirection) != 1)
+ if (ai->is_slice || list_length(*indirection) != 1)
```
We should put list_length() check before ai->is_slace. Because when indirection length is greater than 1, it take a single check; other it may need to check the both conditions.
2 - 0002 - also in hstore_subs.c
``` *indirection = NIL;
```
We should do “list_delete_first_n(indirection, 1)”.
3 - 0003
```
+Datum
+jsonPathFromParseResult(JsonPathParseResult *jsonpath, int estimated_len,
+ struct Node *escontext)
+{
```
`jsonpath` is not changed in this function, so it should be defined as a const: “const JsonPathParseResult *jsonpath”.
4 - 0004
```
+ Oid targets[2] = {INT4OID, TEXTOID};
+
+ /*
+ * Jsonb can handle multiple subscript types, but cases when a
+ * subscript could be coerced to multiple target types must be
+ * avoided, similar to overloaded functions. It could be possibly
+ * extend with jsonpath in the future.
+ */
+ for (int i = 0; i < 2; i++)
```
This is a nit optimization. We can do:
Oid targets[] = {INT4OID, TEXTOID};
For (int I = 0; I < sizeof(targets) / sizeof(targets[0]); I ++)
This way lets the compiler to decide length of “targets”. So that avoids hard code “2” in “for”. If we need to add more elements to “targets”, we can just add it without updating “for”.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Sep 10, 2025, at 07:53, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:I think this (jb)[0] case is rather trivial. However, since it's been
behaving the pre-standard way since PG14, I hesitate to change the
existing behavior as part of my patches, and I feel it could be a
bikeshedding kind of discussion in a separate thread.
I am fine to retain the PG14 behavior. But the confusion part is:
```
evantest=# select ('{"name": "Alice", "birthday": {"year": 2000, "month": 8}}'::jsonb)[0]['birthday']['year'];
jsonb
-------
(1 row)
evantest=# select ('{"name": "Alice", "birthday": {"year": 2000, "month": 8}}'::jsonb)[0]['birthday'].year;
year
------
2000
(1 row)
```
As long as the expression containing a dot notation, “[0]” behaves differently from PG14.
I agree we should not recommend the mixed usages of `[`a`]` and `.a`, but we haven’t prevented from that. Given they are different meanings, `[‘a’]` implies strict mode and `.a` implies lax mode, for a complex JSON object, users may intentionally mix use both, which sounds reasonable.
I don’t see you have updated any documentation yet. I don’t want to block you because of this issue. As long as you state the behavior clearly in the document, I am happy to let it go.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
Hi Chao,
On Tue, Sep 9, 2025 at 8:52 PM Chao Li <li.evan.chao@gmail.com> wrote:
<v16-0007-Implement-jsonb-wildcard-member-accessor.patch><v16-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v16-0004-Extract-coerce_jsonpath_subscript.patch><v16-0006-Implement-Jsonb-subscripting-with-slicing.patch><v16-0005-Implement-read-only-dot-notation-for-jsonb.patch><v16-0003-Export-jsonPathFromParseResult.patch><v16-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch>A few more comment for v16.1 - 0002```--- a/contrib/hstore/hstore_subs.c+++ b/contrib/hstore/hstore_subs.c- if (isSlice || list_length(*indirection) != 1)+ if (ai->is_slice || list_length(*indirection) != 1)```We should put list_length() check before ai->is_slace. Because when indirection length is greater than 1, it take a single check; other it may need to check the both conditions.2 - 0002 - also in hstore_subs.c```*indirection = NIL;```We should do “list_delete_first_n(indirection, 1)”.3 - 0003```+Datum+jsonPathFromParseResult(JsonPathParseResult *jsonpath, int estimated_len,+ struct Node *escontext)+{````jsonpath` is not changed in this function, so it should be defined as a const: “const JsonPathParseResult *jsonpath”.4 - 0004```+ Oid targets[2] = {INT4OID, TEXTOID};++ /*+ * Jsonb can handle multiple subscript types, but cases when a+ * subscript could be coerced to multiple target types must be+ * avoided, similar to overloaded functions. It could be possibly+ * extend with jsonpath in the future.+ */+ for (int i = 0; i < 2; i++)```This is a nit optimization. We can do:Oid targets[] = {INT4OID, TEXTOID};For (int I = 0; I < sizeof(targets) / sizeof(targets[0]); I ++)This way lets the compiler to decide length of “targets”. So that avoids hard code “2” in “for”. If we need to add more elements to “targets”, we can just add it without updating “for”.
Thanks for the feedback! Here’s v17. I’ve addressed 1 - 3 exactly as
you suggested. For 4, I made the targets array const and used the
lengthof macro to determine the array length.
you suggested. For 4, I made the targets array const and used the
lengthof macro to determine the array length.
Best,
Alex
Вложения
- v17-0007-Implement-jsonb-wildcard-member-accessor.patch
- v17-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v17-0006-Implement-Jsonb-subscripting-with-slicing.patch
- v17-0004-Extract-coerce_jsonpath_subscript.patch
- v17-0005-Implement-read-only-dot-notation-for-jsonb.patch
- v17-0003-Export-jsonPathFromParseResult.patch
- v17-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
Hi Chao,
On Tue, Sep 9, 2025 at 9:03 PM Chao Li <li.evan.chao@gmail.com> wrote:
I don’t see you have updated any documentation yet. I don’t want to block you because of this issue. As long as you state the behavior clearly in the document, I am happy to let it go.
Thanks for the reminder for documentation! I've attached v18. v18 adds
documentation in 0005. I've also updated the commit messages for 0003,
0004, and 0007 to include reviewers' names.
For everyone: my immediate goal is to move 0001 - 0005 forward, as I
feel more confident about them now. In the meantime, I'd also
appreciate code review for 0006 - 0007.
documentation in 0005. I've also updated the commit messages for 0003,
0004, and 0007 to include reviewers' names.
For everyone: my immediate goal is to move 0001 - 0005 forward, as I
feel more confident about them now. In the meantime, I'd also
appreciate code review for 0006 - 0007.
Best,
Alex
Вложения
- v18-0006-Implement-Jsonb-subscripting-with-slicing.patch
- v18-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v18-0004-Extract-coerce_jsonpath_subscript.patch
- v18-0007-Implement-jsonb-wildcard-member-accessor.patch
- v18-0005-Implement-read-only-dot-notation-for-jsonb.patch
- v18-0003-Export-jsonPathFromParseResult.patch
- v18-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
Hi,
I’ve rebased the patch set, no changes since v18.
I’ve rebased the patch set, no changes since v18.
Best,
Alex
Вложения
- v19-0005-Implement-read-only-dot-notation-for-jsonb.patch
- v19-0006-Implement-Jsonb-subscripting-with-slicing.patch
- v19-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch
- v19-0004-Extract-coerce_jsonpath_subscript.patch
- v19-0007-Implement-jsonb-wildcard-member-accessor.patch
- v19-0003-Export-jsonPathFromParseResult.patch
- v19-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch
Hi there,
In the meanwhile, these functions still return correct results through
standard syntax:
I've attached v20. It has the following changes:
1. New 0001: It adds test coverage for single-argument functions and
casting for jsonb expressions. This ensures that the relevant behavior
changes become visible in 0005 when field access via dot-notation is
introduced.
casting for jsonb expressions. This ensures that the relevant behavior
changes become visible in 0005 when field access via dot-notation is
introduced.
Specifically, once member access through dot-notation is enabled for
jsonb, we can no longer write single-argument functions (including
casts) in dot form for jsonb. For example:
jsonb, we can no longer write single-argument functions (including
casts) in dot form for jsonb. For example:
Before 0005:
select ('{"a":1}'::jsonb).jsonb_typeof;
jsonb_typeof
--------------
object
--------------
object
(1 row)
select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb).name;
name
--------------------------------------
name
--------------------------------------
[{"name": "alice"}, {"name": "bob"}]
(1 row)
After 0005:
select ('{"a":1}'::jsonb).jsonb_typeof;
jsonb_typeof
--------------
(1 row)
--------------
(1 row)
select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb).name;
name
------------------
["alice", "bob"]
(1 row)
name
------------------
["alice", "bob"]
(1 row)
standard syntax:
test=# select jsonb_typeof(('{"a":1}'::jsonb));
jsonb_typeof
--------------
object
(1 row)
--------------
object
(1 row)
test=# select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb)::name;
name
--------------------------------------
[{"name": "alice"}, {"name": "bob"}]
(1 row)
name
--------------------------------------
[{"name": "alice"}, {"name": "bob"}]
(1 row)
I don't consider this behavior change a major issue, because the
dot-form for single-argument functions is not standard SQL and seems
to be PostgreSQL-specific. Still, it's worth highlighting here so
users aren't surprised.
2. Refactored 0002: It combines and refactors v19-0001 and v19-0002.
Instead of changing the existing transform() callback in
SubscriptRoutines, it now introduces an additional callback,
transform_partial(). This alternative transform method, used by jsonb,
is more flexible: it accepts a wider range of indirection node types
and can transform only a prefix of the indirection list. This avoids
breaking compatibility for arrays, hstore, and external data types
that supports subscripting.
3. 0003 and 0004 stay unchanged. They are both small and can be squashed
into 0005. I leave them as-is for now for easier review.
4. Added two additional tests in 0005 for assignments using jsonb
dot-notation, showing explicitly that assignment is not yet supported.
5. Removed 0006 (array slicing) and 0007 (wildcard) from the previous
versions, as they need additional work. My immediate goal is to first
reach consensus on the dot-notation implementation.
dot-form for single-argument functions is not standard SQL and seems
to be PostgreSQL-specific. Still, it's worth highlighting here so
users aren't surprised.
2. Refactored 0002: It combines and refactors v19-0001 and v19-0002.
Instead of changing the existing transform() callback in
SubscriptRoutines, it now introduces an additional callback,
transform_partial(). This alternative transform method, used by jsonb,
is more flexible: it accepts a wider range of indirection node types
and can transform only a prefix of the indirection list. This avoids
breaking compatibility for arrays, hstore, and external data types
that supports subscripting.
3. 0003 and 0004 stay unchanged. They are both small and can be squashed
into 0005. I leave them as-is for now for easier review.
4. Added two additional tests in 0005 for assignments using jsonb
dot-notation, showing explicitly that assignment is not yet supported.
5. Removed 0006 (array slicing) and 0007 (wildcard) from the previous
versions, as they need additional work. My immediate goal is to first
reach consensus on the dot-notation implementation.
Best,
Alex
Вложения
Looks like patch files need rebase.
On Sep 20, 2025, at 04:40, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:Hi there,<v20-0005-Implement-read-only-dot-notation-for-jsonb.patch><v20-0004-Extract-coerce_jsonpath_subscript.patch><v20-0001-Add-test-coverage-for-indirection-transformation.patch><v20-0002-Add-an-alternative-transform-function-in-Subscri.patch><v20-0003-Export-jsonPathFromParseResult.patch>I've attached v20. It has the following changes:1. New 0001: It adds test coverage for single-argument functions and
casting for jsonb expressions. This ensures that the relevant behavior
changes become visible in 0005 when field access via dot-notation is
introduced.Specifically, once member access through dot-notation is enabled for
jsonb, we can no longer write single-argument functions (including
casts) in dot form for jsonb. For example:Before 0005:select ('{"a":1}'::jsonb).jsonb_typeof;jsonb_typeof
--------------
object(1 row)select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb).name;
name
--------------------------------------[{"name": "alice"}, {"name": "bob"}](1 row)After 0005:select ('{"a":1}'::jsonb).jsonb_typeof;jsonb_typeof
--------------
(1 row)select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb).name;
name
------------------
["alice", "bob"]
(1 row)In the meanwhile, these functions still return correct results through
standard syntax:test=# select jsonb_typeof(('{"a":1}'::jsonb));jsonb_typeof
--------------
object
(1 row)test=# select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb)::name;
name
--------------------------------------
[{"name": "alice"}, {"name": "bob"}]
(1 row)I don't consider this behavior change a major issue, because the
dot-form for single-argument functions is not standard SQL and seems
to be PostgreSQL-specific. Still, it's worth highlighting here so
users aren't surprised.
2. Refactored 0002: It combines and refactors v19-0001 and v19-0002.
Instead of changing the existing transform() callback in
SubscriptRoutines, it now introduces an additional callback,
transform_partial(). This alternative transform method, used by jsonb,
is more flexible: it accepts a wider range of indirection node types
and can transform only a prefix of the indirection list. This avoids
breaking compatibility for arrays, hstore, and external data types
that supports subscripting.
3. 0003 and 0004 stay unchanged. They are both small and can be squashed
into 0005. I leave them as-is for now for easier review.
4. Added two additional tests in 0005 for assignments using jsonb
dot-notation, showing explicitly that assignment is not yet supported.
5. Removed 0006 (array slicing) and 0007 (wildcard) from the previous
versions, as they need additional work. My immediate goal is to first
reach consensus on the dot-notation implementation.Best,Alex
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
On Sun, Sep 21, 2025 at 8:32 PM Chao Li <li.evan.chao@gmail.com> wrote:
Looks like patch files need rebase.
Вложения
The new approach of introducing “transform_partial” looks like a better solution, which leads to less code change to hstore_subs and arraysubs. However, when I tested the v21, I encountered errors when combine composite type, array and jsonb together.
Prepare test data:
```
drop table if exists people;
drop type if exists person;
CREATE TYPE person AS (
name text,
size int[],
meta jsonb[]
);
CREATE TABLE people (
p person
);
INSERT INTO people VALUES (ROW('Alice', array[10, 20], array['{"a": 30}'::jsonb, '{"a": 40}'::jsonb]));
```
and
Then run the test:
```
# old jsonb accessor works to extract a jsonb field from an array item of a composite field
evantest=# select (p).meta[1]->'a' from people;
?column?
----------
30
(1 row)
# dot notation also works
evantest=# select (p).meta[1].a from people;
a
----
30
(1 row)
# but index accessor doesn’t work
evantest=# select (p).meta[1]['a'] from people;
ERROR: invalid input syntax for type integer: "a"
LINE 1: select (p).meta[1]['a'] from people;
^
```
Other than that, I got a few new comments:
On Sep 23, 2025, at 01:31, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:<v21-0004-Extract-coerce_jsonpath_subscript.patch><v21-0005-Implement-read-only-dot-notation-for-jsonb.patch><v21-0001-Add-test-coverage-for-indirection-transformation.patch><v21-0002-Add-an-alternative-transform-function-in-Subscri.patch><v21-0003-Export-jsonPathFromParseResult.patch>There were trailing whitespaces in the documentation I added, I’ve fixed them now.
1 - 0001 - overall looks good
2 - 0002
```
+ /* Collect leading A_Indices subscripts */
+ foreach(lc, indirection)
+ {
+ Node *n = lfirst(lc);
+
+ if (IsA(n, A_Indices))
+ {
+ A_Indices *ai = (A_Indices *) n;
+
+ subscriptlist = lappend(subscriptlist, n);
+ if (ai->is_slice)
+ isSlice = true;
+ }
+ else
+ break;
```
We can break after “isSlice=true”.
3 - 0002
```
+ * list, and handle the
+ * remaining indirections differently or to raise an error as needed.
```
Not well formatted, “remaining” can go to the previous line.
4 - 0002
```
+ if (sbsroutines->transform_partial != NULL)
+ {
```
Do we want to assert that one of transform and transform_partial should not be NULL before “if"?
5 - 0002
```
+ /*
+ * If there is no partial transform function, use the full transform
+ * function, which only accepts bracket subscripts (A_Indices nodes).
+ * We pre-collect the leading A_Indices nodes from the indirection
```
“If there is no partial transform function” sounds redundant, I think we can just go with “Full transform function only accepts …”.
6 - 0002
```
+ /* This should not happen with well-behaved transform functions */
+ elog(ERROR, "subscripting transform function failed to consume any indirection elements”);
```
I don’t see an existing error message uses “indirection” and “transform”. This error message looks more like a log message rather than a message to show to end users.
7 - 0002
```
--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -39,11 +39,10 @@ typedef struct JsonbSubWorkspace
* Transform the subscript expressions, coerce them to text,
* and determine the result type of the SubscriptingRef node.
*/
-static void
+static int
jsonb_subscript_transform(SubscriptingRef *sbsref,
List *indirection,
ParseState *pstate,
- bool isSlice,
bool isAssignment)
```
As return type is changed, function comment should be updated accordingly.
8 - 0005 - jsonb.sql
As we discussed earlier, now
select ('{"a": 1}'::jsonb)[0]['a'];
select ('{"a": 1}'::jsonb)[0].a;
Will return different results. Maybe that part needs more discussion, but we at least don’t want random behavior. So I would suggest add the two cases into the test script, so that other reviewers may easily notice that, thus gets more inputs from more people.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
Hi Chao,
Thanks for reviewing. I'm glad you like the new approach of
introducing "transform_partial". I've attached v22, which addresses
some of your feedback, and I ran pgindent again.
introducing "transform_partial". I've attached v22, which addresses
some of your feedback, and I ran pgindent again.
See detailed replies below.
On Mon, Sep 22, 2025 at 10:48 PM Chao Li <li.evan.chao@gmail.com> wrote:
The new approach of introducing “transform_partial” looks like a better solution, which leads to less code change to hstore_subs and arraysubs. However, when I tested the v21, I encountered errors when combine composite type, array and jsonb together.Prepare test data:```drop table if exists people;drop type if exists person;CREATE TYPE person AS (name text,size int[],meta jsonb[]);CREATE TABLE people (p person);INSERT INTO people VALUES (ROW('Alice', array[10, 20], array['{"a": 30}'::jsonb, '{"a": 40}'::jsonb]));```Then run the test:```# old jsonb accessor works to extract a jsonb field from an array item of a composite fieldevantest=# select (p).meta[1]->'a' from people;?column?----------30(1 row)# dot notation also worksevantest=# select (p).meta[1].a from people;a----30(1 row)# but index accessor doesn’t workevantest=# select (p).meta[1]['a'] from people;ERROR: invalid input syntax for type integer: "a"LINE 1: select (p).meta[1]['a'] from people;^
This is the expected behavior for array subscripting, and my patch
doesn't change that. I don't think this is a problem. With or without
my patch, you can avoid the ERROR by adding parentheses:
doesn't change that. I don't think this is a problem. With or without
my patch, you can avoid the ERROR by adding parentheses:
test=# select ((p).meta[1])['a'] from people; meta ------ 30 (1 row)
On Mon, Sep 22, 2025 at 10:48 PM Chao Li <li.evan.chao@gmail.com> wrote:
2 - 0002```+ /* Collect leading A_Indices subscripts */+ foreach(lc, indirection)+ {+ Node *n = lfirst(lc);++ if (IsA(n, A_Indices))+ {+ A_Indices *ai = (A_Indices *) n;++ subscriptlist = lappend(subscriptlist, n);+ if (ai->is_slice)+ isSlice = true;+ }+ else+ break;```We can break after “isSlice=true”.
Why? We still want to get the whole prefix list of A_Indices.
On Mon, Sep 22, 2025 at 10:48 PM Chao Li <li.evan.chao@gmail.com> wrote:
6 - 0002```+ /* This should not happen with well-behaved transform functions */+ elog(ERROR, "subscripting transform function failed to consume any indirection elements”);```I don’t see an existing error message uses “indirection” and “transform”. This error message looks more like a log message rather than a message to show to end users.
This is a defensive elog message that should not happen. So it is a
log message for developers. That said, I'm open to suggestions for
better wording.
log message for developers. That said, I'm open to suggestions for
better wording.
The rest of your feedback I've made changes accordingly as you suggested.
Best,
Alex
Вложения
On Sep 24, 2025, at 09:05, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:This is the expected behavior for array subscripting, and my patch
doesn't change that. I don't think this is a problem. With or without
my patch, you can avoid the ERROR by adding parentheses:test=# select ((p).meta[1])['a'] from people; meta ------ 30 (1 row)
Okay, make sense.
2 - 0002```+ /* Collect leading A_Indices subscripts */+ foreach(lc, indirection)+ {+ Node *n = lfirst(lc);++ if (IsA(n, A_Indices))+ {+ A_Indices *ai = (A_Indices *) n;++ subscriptlist = lappend(subscriptlist, n);+ if (ai->is_slice)+ isSlice = true;+ }+ else+ break;```We can break after “isSlice=true”.Why? We still want to get the whole prefix list of A_Indices.
Ah, I just realized the main goal of the foreach loop is to build a new “subscrptlist”. In that case, it should not break.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/