Обсуждение: sql/json query function JsonBehavior default expression's collation may differ from returning type's collation
hi. create domain d1 as text collate case_insensitive; select json_value('{"a": "A"}', '$.a' returning d1 default 'C' on empty) = 'a'; --return true select json_value('{"a": "A"}', '$.c' returning d1 default 'A' on empty) = 'a'; --return true select json_value('{"a": "A"}', '$.c' returning d1 default 'A' collate "C" on empty) = 'a'; currently the above query will produce an error: ERROR: unrecognized node type: 31 this error message needs to be fixed. however, if no error is raised, should the query return true or false? if not error out, It also means that the collation of the json_value expression is determined at runtime, which makes it unsuitable for use in index creation or check constraints. overall, raising an error if the collation of the JsonBehavior DEFAULT clause differs from that of the RETURNING clause is the best option. what do you think?
On Mon, Jul 14, 2025 at 7:39 PM jian he <jian.universality@gmail.com> wrote: > > overall, raising an error if the collation of the > JsonBehavior DEFAULT clause differs from that of the RETURNING clause > is the best option. > > what do you think? in exprSetCollation, the node can be T_CollateExpr. In that case, CollateExpr->collOid should be the same as the collation of the caller. --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -1131,6 +1131,10 @@ exprSetCollation(Node *expr, Oid collation) case T_Const: ((Const *) expr)->constcollid = collation; break; + case T_CollateExpr: + if (((CollateExpr *) expr)->collOid != collation) + elog(ERROR, "COLLATE clause collation should be %u", collation); + break; case T_Param: ((Param *) expr)->paramcollid = collation; break; diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index d66276801c6..9cbffff52c3 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -4825,6 +4825,15 @@ transformJsonBehavior(ParseState *pstate, JsonBehavior *behavior, parser_errposition(pstate, exprLocation(expr))); } + if (typcategory == TYPCATEGORY_STRING && + exprCollation(coerced_expr) != get_typcollation(returning->typid)) + { + ereport(ERROR, + errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("DEFAULT expression collation does not match with RETURNING type's collation"), + parser_errposition(pstate, exprLocation(coerced_expr))); + } + create table t(a jsonb); select json_value(a, '$.c' returning text default 'A' collate "C" on empty) from t; ERROR: DEFAULT expression collation does not match with RETURNING type's collation as you can see, this query returns a set of rows. If the collation of the DEFAULT node differs from the default text collation, the resulting set may have inconscient collations. a set of rows all the collation should be the same. overall I think it should error out.
Вложения
Hi Jian,
Thanks for the patch and also for the offlist heads-up.
I agree with rejecting cases where the DEFAULT clause’s collation does not match the RETURNING collation. The result collation for json_value should come from the RETURNING clause if it has an explicit COLLATE, otherwise from the RETURNING type’s collation, and both the extracted value source (the value obtained from the JSON path when it matches) and the DEFAULT source should match it.
I would not add a T_CollateExpr arm to exprSetCollation(). CollateExpr is just a transient wrapper that the planner rewrites to RelabelType, so it does not make sense to stamp it with a result collation there. Instead, transformJsonBehavior() should check if a source expression is a CollateExpr, verify that its collOid matches the target collation, and either keep it as is (no stamping needed if the collOid already matches the target) or strip the wrapper and stamp the inner node with exprSetCollation() if you need the inner node itself to carry the target collation.
That is my current understanding of how exprSetCollation works, but I will confirm when I can check the code on my computer next week. Someone like Tom can correct me if I have missed something in the meantime.
Thanks, Amit Langote
On Tue, Aug 12, 2025 at 7:09 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Hi Jian, > > Thanks for the patch and also for the offlist heads-up. > > I agree with rejecting cases where the DEFAULT clause’s collation does not match the RETURNING collation. The result collationfor json_value should come from the RETURNING clause if it has an explicit COLLATE, otherwise from the RETURNINGtype’s collation, and both the extracted value source (the value obtained from the JSON path when it matches) andthe DEFAULT source should match it. > hi. based on my understand of https://www.postgresql.org/docs/current/collation.html#COLLATION-CONCEPTS <<<<<<< 1. If any input expression has an explicit collation derivation, then all explicitly derived collations among the input expressions must be the same, otherwise an error is raised. If any explicitly derived collation is present, that is the result of the collation combination. 2. Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation. <<<<<<< CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false); create domain d1 as text collate case_insensitive; create domain d2 as text collate "C"; the below two queries should error out: select json_value('{"a": "A"}', '$.a' returning d1 default 'C'::d2 on empty) = 'a'; --error select json_value('{"a": "A"}', '$.a' returning d1 default 'C' collate "C" on empty) = 'a'; --error please check attached patch.