Обсуждение: sql/json query function JsonBehavior default expression's collation may differ from returning type's collation
hi.
create domain d1 as text collate case_insensitive;
select json_value('{"a": "A"}', '$.a' returning d1 default 'C' on
empty) = 'a'; --return true
select json_value('{"a": "A"}', '$.c' returning d1 default 'A' on
empty) = 'a'; --return true
select json_value('{"a": "A"}', '$.c' returning d1 default 'A' collate
"C" on empty) = 'a';
currently the above query will produce an error:
ERROR: unrecognized node type: 31
this error message needs to be fixed.
however, if no error is raised, should the query return true or false?
if not error out, It also means that the collation of the json_value expression
is determined at runtime, which makes it unsuitable for use in index creation or
check constraints.
overall, raising an error if the collation of the
JsonBehavior DEFAULT clause differs from that of the RETURNING clause
is the best option.
what do you think?
On Mon, Jul 14, 2025 at 7:39 PM jian he <jian.universality@gmail.com> wrote:
>
> overall, raising an error if the collation of the
> JsonBehavior DEFAULT clause differs from that of the RETURNING clause
> is the best option.
>
> what do you think?
in exprSetCollation, the node can be T_CollateExpr.
In that case, CollateExpr->collOid should be the same as the collation
of the caller.
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1131,6 +1131,10 @@ exprSetCollation(Node *expr, Oid collation)
case T_Const:
((Const *) expr)->constcollid = collation;
break;
+ case T_CollateExpr:
+ if (((CollateExpr *) expr)->collOid != collation)
+ elog(ERROR, "COLLATE clause collation should be %u",
collation);
+ break;
case T_Param:
((Param *) expr)->paramcollid = collation;
break;
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index d66276801c6..9cbffff52c3 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -4825,6 +4825,15 @@ transformJsonBehavior(ParseState *pstate,
JsonBehavior *behavior,
parser_errposition(pstate, exprLocation(expr)));
}
+ if (typcategory == TYPCATEGORY_STRING &&
+ exprCollation(coerced_expr) !=
get_typcollation(returning->typid))
+ {
+ ereport(ERROR,
+ errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("DEFAULT expression collation does not
match with RETURNING type's collation"),
+ parser_errposition(pstate,
exprLocation(coerced_expr)));
+ }
+
create table t(a jsonb);
select json_value(a, '$.c' returning text default 'A' collate "C" on
empty) from t;
ERROR: DEFAULT expression collation does not match with RETURNING
type's collation
as you can see, this query returns a set of rows. If the collation of the
DEFAULT node differs from the default text collation, the resulting set may have
inconscient collations.
a set of rows all the collation should be the same.
overall I think it should error out.
Вложения
Hi Jian,
Thanks for the patch and also for the offlist heads-up.
I agree with rejecting cases where the DEFAULT clause’s collation does not match the RETURNING collation. The result collation for json_value should come from the RETURNING clause if it has an explicit COLLATE, otherwise from the RETURNING type’s collation, and both the extracted value source (the value obtained from the JSON path when it matches) and the DEFAULT source should match it.
I would not add a T_CollateExpr arm to exprSetCollation(). CollateExpr is just a transient wrapper that the planner rewrites to RelabelType, so it does not make sense to stamp it with a result collation there. Instead, transformJsonBehavior() should check if a source expression is a CollateExpr, verify that its collOid matches the target collation, and either keep it as is (no stamping needed if the collOid already matches the target) or strip the wrapper and stamp the inner node with exprSetCollation() if you need the inner node itself to carry the target collation.
That is my current understanding of how exprSetCollation works, but I will confirm when I can check the code on my computer next week. Someone like Tom can correct me if I have missed something in the meantime.
Thanks, Amit Langote
On Tue, Aug 12, 2025 at 7:09 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Hi Jian, > > Thanks for the patch and also for the offlist heads-up. > > I agree with rejecting cases where the DEFAULT clause’s collation does not match the RETURNING collation. The result collationfor json_value should come from the RETURNING clause if it has an explicit COLLATE, otherwise from the RETURNINGtype’s collation, and both the extracted value source (the value obtained from the JSON path when it matches) andthe DEFAULT source should match it. > hi. based on my understand of https://www.postgresql.org/docs/current/collation.html#COLLATION-CONCEPTS <<<<<<< 1. If any input expression has an explicit collation derivation, then all explicitly derived collations among the input expressions must be the same, otherwise an error is raised. If any explicitly derived collation is present, that is the result of the collation combination. 2. Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation. <<<<<<< CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false); create domain d1 as text collate case_insensitive; create domain d2 as text collate "C"; the below two queries should error out: select json_value('{"a": "A"}', '$.a' returning d1 default 'C'::d2 on empty) = 'a'; --error select json_value('{"a": "A"}', '$.a' returning d1 default 'C' collate "C" on empty) = 'a'; --error please check attached patch.
Вложения
Hi Jian, (Sorry for the long delay -- I meant to get back to this a while ago.) On Fri, Oct 3, 2025 at 9:57 PM jian he <jian.universality@gmail.com> wrote: > hi. > > based on my understand of > https://www.postgresql.org/docs/current/collation.html#COLLATION-CONCEPTS > <<<<<<< > 1. If any input expression has an explicit collation derivation, then all > explicitly derived collations among the input expressions must be the same, > otherwise an error is raised. If any explicitly derived collation is present, > that is the result of the collation combination. > > 2. Otherwise, all input expressions must have the same implicit collation > derivation or the default collation. If any non-default collation is present, > that is the result of the collation combination. Otherwise, the result is the > default collation. > <<<<<<< > > CREATE COLLATION case_insensitive (provider = icu, locale = > 'und-u-ks-level2', deterministic = false); > create domain d1 as text collate case_insensitive; > create domain d2 as text collate "C"; > > the below two queries should error out: > select json_value('{"a": "A"}', '$.a' returning d1 default 'C'::d2 on > empty) = 'a'; --error > select json_value('{"a": "A"}', '$.a' returning d1 default 'C' collate > "C" on empty) = 'a'; --error > > please check attached patch. Thanks for posting v2 of the patch. I’ve made a few follow-up changes (v3 attached): * Moved the regression tests from sqljson_queryfuncs.sql to collation.icu.utf8.sql to avoid failures on buildfarm machines without ICU support. * Adjusted the collation-mismatch check in transformJsonBehavior() so that it runs last within the DEFAULT-handling block. That keeps the control flow cleaner and avoids affecting existing tests that already fail earlier checks, preventing unnecessary regression output churn. * Did a few cosmetic edits and fixed the error code and message text. Otherwise, behavior and coverage remain the same. -- Thanks, Amit Langote
Вложения
On Mon, Oct 6, 2025 at 10:58 PM Amit Langote <amitlangote09@gmail.com> wrote: > (Sorry for the long delay -- I meant to get back to this a while ago.) > > Thanks for posting v2 of the patch. I’ve made a few follow-up changes > (v3 attached): > > * Moved the regression tests from sqljson_queryfuncs.sql to > collation.icu.utf8.sql to avoid failures on buildfarm machines without > ICU support. > > * Adjusted the collation-mismatch check in transformJsonBehavior() so > that it runs last within the DEFAULT-handling block. That keeps the > control flow cleaner and avoids affecting existing tests that already > fail earlier checks, preventing unnecessary regression output churn. > > * Did a few cosmetic edits and fixed the error code and message text. > > Otherwise, behavior and coverage remain the same. After sleeping on this, I realized the right fix is not to strip CollateExpr in transformJsonBehavior(), but to stop recursing on JsonBehavior.expr in exprSetCollation(). With this patch, the parser now ensures that the JsonBehavior.expr has the correct collation, so the recursion isn’t needed. There is now an Assert in its place that checks that JsonBehavior.expr already has the target collation. Attached v4. -- Thanks, Amit Langote
Вложения
On Wed, Oct 8, 2025 at 12:10 PM Amit Langote <amitlangote09@gmail.com> wrote: > > After sleeping on this, I realized the right fix is not to strip > CollateExpr in transformJsonBehavior(), but to stop recursing on > JsonBehavior.expr in exprSetCollation(). With this patch, the parser > now ensures that the JsonBehavior.expr has the correct collation, so > the recursion isn’t needed. There is now an Assert in its place that > checks that JsonBehavior.expr already has the target collation. > hi, Amit. thanks for pushing it. https://git.postgresql.org/cgit/postgresql.git/commit/?id=ef5e60a9d352a97791af632e0d26a572bc88e921 it took me a little time to understand why `` exprSetCollation case T_JsonBehavior: Assert(((JsonBehavior *) expr)->expr == NULL || exprCollation(((JsonBehavior *) expr)->expr) == collation); `` works, after looking at assign_collations_walker. it will recursively set JsonBehavior->expr collation first then call exprSetCollation for JsonBehavior itself. I will double-check other JSON constructs with RETURNING clauses for potential collation issues in the future.
Hi Jian, On Thu, Oct 9, 2025 at 12:48 PM jian he <jian.universality@gmail.com> wrote: > On Wed, Oct 8, 2025 at 12:10 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > After sleeping on this, I realized the right fix is not to strip > > CollateExpr in transformJsonBehavior(), but to stop recursing on > > JsonBehavior.expr in exprSetCollation(). With this patch, the parser > > now ensures that the JsonBehavior.expr has the correct collation, so > > the recursion isn’t needed. There is now an Assert in its place that > > checks that JsonBehavior.expr already has the target collation. > > > > hi, Amit. > thanks for pushing it. > > https://git.postgresql.org/cgit/postgresql.git/commit/?id=ef5e60a9d352a97791af632e0d26a572bc88e921 > > it took me a little time to understand why > `` > exprSetCollation > case T_JsonBehavior: > Assert(((JsonBehavior *) expr)->expr == NULL || > exprCollation(((JsonBehavior *) expr)->expr) == collation); > `` > works, after looking at assign_collations_walker. > > it will recursively set JsonBehavior->expr collation first then call > exprSetCollation for JsonBehavior itself. > > I will double-check other JSON constructs with RETURNING clauses for potential > collation issues in the future. Yes, that would be great. Thanks for your close attention to this area of code. -- Thanks, Amit Langote