Обсуждение: implement CAST(expr AS type FORMAT 'template')

Поиск
Список
Период
Сортировка

implement CAST(expr AS type FORMAT 'template')

От
jian he
Дата:
hi.

while working on CAST(... DEFAULT ON ERROR), I came across link[1].  I don't
have access to the SQL standard, but based on the information in link[1], for
CAST(val AS type FORMAT 'template'), I make the <cast template> as an A_Const
node in gram.y.

so the attached patch is to implement
     CAST <left paren>
         <cast operand> AS <cast target>
         [ FORMAT <cast template> ]
         <right paren>

The implementation is pretty straightforward.
CAST(val AS type FORMAT 'template')
internally, it will be transformed into a FuncExpr node whose funcid
corresponds to
function name as one of (to_number, to_date, to_timestamp, to_char).
template as a Const node will make life easier.

select proname, prosrc, proallargtypes, proargtypes,
prorettype::regtype, proargnames
from pg_proc
where proname in ('to_number', 'to_date',  'to_timestamp', 'to_char');

based on the query results, only a limited set of type casts are supported with
formatted casts.  so error out early if the source or target type doesn't meet
these conditions.  for example, if the source or target is a composite, array,
or polymorphic type.

demo:
select cast('2018-13-12' as date format 'YYYY-MM-DD'); --error
select cast('2018-13-12' as date format 'YYYY-DD-MM'); --no error
select to_char(cast('2018-13-12' as date format 'YYYY-DD-MM'), 'YYYY-Mon-DD');
returns
2018-Dec-13

[1]: https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard#Major_features_simply_not_implemented_yet

Вложения

Re: implement CAST(expr AS type FORMAT 'template')

От
Vik Fearing
Дата:
On 27/07/2025 17:43, jian he wrote:
> hi.
>
> while working on CAST(... DEFAULT ON ERROR), I came across link[1].  I don't
> have access to the SQL standard, but based on the information in link[1], for
> CAST(val AS type FORMAT 'template'), I make the <cast template> as an A_Const
> node in gram.y.


Why does it have to be an A_const?  Shouldn't any a_expr work there?


> so the attached patch is to implement
>       CAST <left paren>
>           <cast operand> AS <cast target>
>           [ FORMAT <cast template> ]
>           <right paren>


This is correct syntax.  Thanks for working on it!


> The implementation is pretty straightforward.
> CAST(val AS type FORMAT 'template')
> internally, it will be transformed into a FuncExpr node whose funcid
> corresponds to
> function name as one of (to_number, to_date, to_timestamp, to_char).
> template as a Const node will make life easier.


This doesn't seem very postgres-y to me.  Wouldn't it be better to add 
something like castformatfuncid to pg_cast?  That way any types that 
have that would just call that.  It would allow extensions to add 
formatted casting to their types, for example.


> select proname, prosrc, proallargtypes, proargtypes,
> prorettype::regtype, proargnames
> from pg_proc
> where proname in ('to_number', 'to_date',  'to_timestamp', 'to_char');
>
> based on the query results, only a limited set of type casts are supported with
> formatted casts.  so error out early if the source or target type doesn't meet
> these conditions.  for example, if the source or target is a composite, array,
> or polymorphic type.


The standard is strict on what types can be cast to another, but I see 
no reason not to be more generic.

-- 

Vik Fearing




Re: implement CAST(expr AS type FORMAT 'template')

От
jian he
Дата:
On Mon, Jul 28, 2025 at 2:31 AM Vik Fearing <vik@postgresfriends.org> wrote:
>
>
> On 27/07/2025 17:43, jian he wrote:
> > hi.
> >
> > while working on CAST(... DEFAULT ON ERROR), I came across link[1].  I don't
> > have access to the SQL standard, but based on the information in link[1], for
> > CAST(val AS type FORMAT 'template'), I make the <cast template> as an A_Const
> > node in gram.y.
>
>
> Why does it have to be an A_const?  Shouldn't any a_expr work there?
>

you are right. a_expr should work.
the attached patch changed accordingly.

so now
select cast(NULL as date format NULL::date); ---error
select cast(NULL as date format lower('a')); --no error, returns NULL

>
> > so the attached patch is to implement
> >       CAST <left paren>
> >           <cast operand> AS <cast target>
> >           [ FORMAT <cast template> ]
> >           <right paren>
>

> This is correct syntax.  Thanks for working on it!
>

>
> This doesn't seem very postgres-y to me.  Wouldn't it be better to add
> something like castformatfuncid to pg_cast?  That way any types that
> have that would just call that.  It would allow extensions to add
> formatted casting to their types, for example.
>

select oid, castsource::regtype, casttarget::regtype,
castfunc::regproc, castcontext, castmethod
from pg_cast
where casttarget::regtype::text in ('text') or
castsource::regtype::text in ('text');

As you can see from the query output, cast from other type to text or
cast from text to other type is not in the pg_cast catalog entry.
there are in type input/output functions. it will be represented as a
CoerceViaIO node.
see function find_coercion_pathway (src/backend/parser/parse_coerce.c
line:3577).

adding these pg_cast entries seems tricky.
for example:
(assume castsource as numeric, casttarget as text)
will
(castsource as numeric, casttarget as text, castfunc as numeric_out,
castformatfunc as numeric_to_char)
ever work?
but numeric_out' result type is cstring.
so I tend to think adding castformatfunc to pg_cast will not work.

Вложения

Re: implement CAST(expr AS type FORMAT 'template')

От
Vik Fearing
Дата:
On 28/07/2025 10:41, jian he wrote:
> select oid, castsource::regtype, casttarget::regtype,
> castfunc::regproc, castcontext, castmethod
> from pg_cast
> where casttarget::regtype::text in ('text') or
> castsource::regtype::text in ('text');
>
> As you can see from the query output, cast from other type to text or
> cast from text to other type is not in the pg_cast catalog entry.
> there are in type input/output functions. it will be represented as a
> CoerceViaIO node.
> see function find_coercion_pathway (src/backend/parser/parse_coerce.c
> line:3577).


This is the same issue I came across when I tried to implement it 
several years ago.


> adding these pg_cast entries seems tricky.
> for example:
> (assume castsource as numeric, casttarget as text)
> will
> (castsource as numeric, casttarget as text, castfunc as numeric_out,
> castformatfunc as numeric_to_char)
> ever work?
> but numeric_out' result type is cstring.


I had been imagining another castcontext that would only specify the 
castfunc when the FORMAT claused is used, otherwise the current method 
of passing through IO would be used.


> so I tend to think adding castformatfunc to pg_cast will not work.


Perhaps not, but we need to find a way to make this generic so that 
custom types can define formatting rules for themselves.

-- 

Vik Fearing




Re: implement CAST(expr AS type FORMAT 'template')

От
jian he
Дата:
On Mon, Jul 28, 2025 at 6:47 PM Vik Fearing <vik@postgresfriends.org> wrote:
>
> > adding these pg_cast entries seems tricky.
> > for example:
> > (assume castsource as numeric, casttarget as text)
> > will
> > (castsource as numeric, casttarget as text, castfunc as numeric_out,
> > castformatfunc as numeric_to_char)
> > ever work?
> > but numeric_out' result type is cstring.
>
>
> I had been imagining another castcontext that would only specify the
> castfunc when the FORMAT claused is used, otherwise the current method
> of passing through IO would be used.
>
>
> > so I tend to think adding castformatfunc to pg_cast will not work.
>
>
> Perhaps not, but we need to find a way to make this generic so that
> custom types can define formatting rules for themselves.

We can introduce another column in pg_proc, proformat
hope it's not crazy as it is.

select proname, prosrc, proformat from pg_proc where proformat;
   proname    |       prosrc        | proformat
--------------+---------------------+-----------
 to_char      | timestamptz_to_char | t
 to_char      | numeric_to_char     | t
 to_char      | int4_to_char        | t
 to_char      | int8_to_char        | t
 to_char      | float4_to_char      | t
 to_char      | float8_to_char      | t
 to_number    | numeric_to_number   | t
 to_timestamp | to_timestamp        | t
 to_date      | to_date             | t
 to_char      | interval_to_char    | t
 to_char      | timestamp_to_char   | t

proformat is true means this function is a formatter function.
formatter function requirement:
* first argument or the return type must be TEXT.
* the second argument must be a type of TEXT.
* function should not return a set.
* keyword FORMAT must be specified while CREATE FUNCTION.
* prokind should be PROKIND_FUNCTION, normal function.
* input argument should be two. because I am not sure how to handle
multiple format templates.
like, CAST('A' AS TEXT FORMAT format1 format2).

for example:
CREATE FUNCTION test(TEXT, TEXT) RETURNS JSON AS $$ BEGIN RETURN '1';
END; $$ LANGUAGE plpgsql VOLATILE FORMAT;
this function "test" format text based on second argument(template)
and return json type.

POC attached.
what do you think?

Вложения

Re: implement CAST(expr AS type FORMAT 'template')

От
"David G. Johnston"
Дата:
On Monday, July 28, 2025, jian he <jian.universality@gmail.com> wrote:
On Mon, Jul 28, 2025 at 6:47 PM Vik Fearing <vik@postgresfriends.org> wrote:
>
> > adding these pg_cast entries seems tricky.

select proname, prosrc, proformat from pg_proc where proformat;

what do you think?

My first impression of this choice was not good.

How about changing the specification for create type.  Right now input functions must declare either 1 or 3 arguments.  Let’s also allow for 2 and 4-argument functions where the 2nd or 4th is where the format is passed.  If a data type input function lacks one of those signatures it is a runtime error if a format clause is attached to its cast expression.  For output, we go from having zero input arguments to zero or one, with the same resolution behavior.

Pass null for the format if the clause is missing or the cast is done via the :: operator, or any other context format is not able to be specified.

The slight variation to this would be to specify these 2/4 and 1-arg functions as optional “format_in” and “format_out” optional properties (like typmod_in).  The format-aware code can look for these which will end up having the full implementation while the current IO functions would simply stub out calls, passing null as the format. (Or maybe some variation that looks similar to typmod handling…which I haven’t looked at.)

David J.

Re: implement CAST(expr AS type FORMAT 'template')

От
jian he
Дата:
On Tue, Jul 29, 2025 at 11:54 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> The slight variation to this would be to specify these 2/4 and 1-arg functions as optional “format_in” and
“format_out”optional properties (like typmod_in).  The format-aware code can look for these which will end up having
thefull implementation while the current IO functions would simply stub out calls, passing null as the format. (Or
maybesome variation that looks similar to typmod handling…which I haven’t looked at.) 
>

This may also work.
typmod_in, typmod_out, which is associated with typmod, which is used
in many places.
The only use case for (typformatin, typformatout) is CAST expression.
so we also need to consider the overhead of adding
two oid columns (typformatin, typformatout) to pg_type.

another question is:
should we first implement CAST(expr AS type FORMAT 'template') for limited types
(to_date, to_char, to_number, to_timestamptz)
or first try to make it more generic?



Re: implement CAST(expr AS type FORMAT 'template')

От
Vik Fearing
Дата:
On 01/08/2025 10:22, jian he wrote:
> should we first implement CAST(expr AS type FORMAT 'template') for limited types
> (to_date, to_char, to_number, to_timestamptz)
> or first try to make it more generic?


My fear is that if we don't, it will never get done.

-- 

Vik Fearing




Re: implement CAST(expr AS type FORMAT 'template')

От
jian he
Дата:
hi.
one more question:

For binary coercible type casts, no formatted related function for it,
should we error out?
For example, should the following error out or return text '1'.

select cast('1'::text as text format 'YYYY'::text);



Re: implement CAST(expr AS type FORMAT 'template')

От
"David G. Johnston"
Дата:
On Sun, Aug 3, 2025 at 8:10 PM jian he <jian.universality@gmail.com> wrote:
hi.
one more question:

For binary coercible type casts, no formatted related function for it,
should we error out?
For example, should the following error out or return text '1'.

select cast('1'::text as text format 'YYYY'::text);

I'm hoping the standard says (or allows us to) error out here.

text as a type has no semantics on which to associate a format so it should be an error to attempt to do so.  Not a silent no-op.

I was under the impression that for format to be allowed in the expression one of the two data types involved has to be text and the other must not be text.

IME we are actually implementing a formatting option for text serialization and deserialization here, not a cast (we are just borrowing existing syntax that is serviceable).  Hence the absence of these entries in pg_cast and why the fit into pg_type seems so reasonable.

The existence of the various "to_char" and "to_date" functions reflects the historical lack of a dedicated syntax for this kind of (de-)serialization.  But it seems unwise to bias ourselves to how the new syntax/feature should be implemented just because these functions exist.  At least one design should be done pretending they don't and see what comes out of it.  Their code can always be moved or reused in whatever we come up with; forcing them to be used directly, as-is, within the new solution adds an unnecessary constraint.

David J.

Re: implement CAST(expr AS type FORMAT 'template')

От
Corey Huinker
Дата:
On Sun, Aug 3, 2025 at 11:36 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sun, Aug 3, 2025 at 8:10 PM jian he <jian.universality@gmail.com> wrote:
hi.
one more question:

For binary coercible type casts, no formatted related function for it,
should we error out?
For example, should the following error out or return text '1'.

select cast('1'::text as text format 'YYYY'::text);

I'm hoping the standard says (or allows us to) error out here.

We have some influence in that, I believe.
 

text as a type has no semantics on which to associate a format so it should be an error to attempt to do so.  Not a silent no-op.

+1

 
I was under the impression that for format to be allowed in the expression one of the two data types involved has to be text and the other must not be text.

I hadn't understood that, but also hadn't thought of a case where it might be wanted until just now. What if someone wanted a cast from JSONB to their custom type, and the format was a specific keypath to extract from the JSONB? It's true that could be accomplished by first extracting the keypath and then CASTing that expression, but the same is true for text->date, regexing a YYYY-MM-DD into the locale default.

 

IME we are actually implementing a formatting option for text serialization and deserialization here, not a cast (we are just borrowing existing syntax that is serviceable).  Hence the absence of these entries in pg_cast and why the fit into pg_type seems so reasonable.

The existence of the various "to_char" and "to_date" functions reflects the historical lack of a dedicated syntax for this kind of (de-)serialization.  But it seems unwise to bias ourselves to how the new syntax/feature should be implemented just because these functions exist.  At least one design should be done pretending they don't and see what comes out of it.  Their code can always be moved or reused in whatever we come up with; forcing them to be used directly, as-is, within the new solution adds an unnecessary constraint.

I agree. I'd like the more generic solution, but I don't want to get in the way of getting it done, especially if we can change the internals later with no user impact.

But, once this is implemented, does it then make sense to then parse to_char() and to_date() into casts?

Re: implement CAST(expr AS type FORMAT 'template')

От
Corey Huinker
Дата:
 
another question is:
should we first implement CAST(expr AS type FORMAT 'template') for limited types
(to_date, to_char, to_number, to_timestamptz)
or first try to make it more generic?


That was my plan, essentially rewriting these into safe versions of the existing to_date/to_timestamp/etc functions, but much has changed since then, so while it still seems like a good intermediate step, it may be a distraction as others have stated elsewhere in the thread.


Re: implement CAST(expr AS type FORMAT 'template')

От
"David G. Johnston"
Дата:
On Sunday, August 3, 2025, Corey Huinker <corey.huinker@gmail.com> wrote:
On Sun, Aug 3, 2025 at 11:36 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sun, Aug 3, 2025 at 8:10 PM jian he <jian.universality@gmail.com> wrote:
hi.
one more question:

For binary coercible type casts, no formatted related function for it,
should we error out?
For example, should the following error out or return text '1'.

select cast('1'::text as text format 'YYYY'::text);

I'm hoping the standard says (or allows us to) error out here.

We have some influence in that, I believe.
 

text as a type has no semantics on which to associate a format so it should be an error to attempt to do so.  Not a silent no-op.

+1

 
I was under the impression that for format to be allowed in the expression one of the two data types involved has to be text and the other must not be text.

I hadn't understood that, but also hadn't thought of a case where it might be wanted until just now. What if someone wanted a cast from JSONB to their custom type, and the format was a specific keypath to extract from the JSONB? It's true that could be accomplished by first extracting the keypath and then CASTing that expression, but the same is true for text->date, regexing a YYYY-MM-DD into the locale default.

Feels like the same basic answer.  Create cast has a single (because it’s one-way) function accepting between 1 and 3 arguments.  Change it to accept between 1 and 4 arguments and the 4th is where the format expression gets passed.  If a format expression is present and the function doesn’t have a 4th argument, error.

But, once this is implemented, does it then make sense to then parse to_char() and to_date() into casts?

I have no principled reason but I wouldn’t bother to turn these calls into casts nor do I think turning casts into these specific function calls by name is a good idea.  Leave the legacy stuff in place for compatibility, unchanged from its present form, and do the new stuff anew.

David J.

Re: implement CAST(expr AS type FORMAT 'template')

От
Vik Fearing
Дата:


On 04/08/2025 07:55, David G. Johnston wrote:
On Sunday, August 3, 2025, Corey Huinker <corey.huinker@gmail.com> wrote:
On Sun, Aug 3, 2025 at 11:36 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

I was under the impression that for format to be allowed in the expression one of the two data types involved has to be text and the other must not be text.

I hadn't understood that, but also hadn't thought of a case where it might be wanted until just now. What if someone wanted a cast from JSONB to their custom type, and the format was a specific keypath to extract from the JSONB? It's true that could be accomplished by first extracting the keypath and then CASTing that expression, but the same is true for text->date, regexing a YYYY-MM-DD into the locale default.

Feels like the same basic answer.  Create cast has a single (because it’s one-way) function accepting between 1 and 3 arguments.  Change it to accept between 1 and 4 arguments and the 4th is where the format expression gets passed.  If a format expression is present and the function doesn’t have a 4th argument, error.


This is my position as well.

+1

-- 

Vik Fearing

Re: implement CAST(expr AS type FORMAT 'template')

От
jian he
Дата:
hi.
please check the attached v4 patch.

1. For binary-coercible casts, if the format template is specified,
raise an error.
Example:
SELECT CAST('1'::text AS text FORMAT 'YYYY'::text); -- error
2. limited implementation — currently only supports to_char, to_date,
to_number, and to_timestamp.
3. coerce_to_target_type function is used in many places, refactoring
add another
argument seems not practical. So, I introduced a new function
coerce_to_target_type_fmt. Similarly, since coerce_type is difficult to
refactor too, I created a new function coerce_type_fmt.

At this stage, we have not modified any pg_cast entries.  Adding to_char,
to_date, etc., into pg_cast has implications that require more consideration
(see [1]).
Also for this patch, including these functions in pg_cast is not really
necessary to achieve the intended behavior.

[1] https://postgr.es/m/CACJufxF4OW=x2rCwa+ZmcgopDwGKDXha09qTfTpCj3QSTG6Y9Q@mail.gmail.com

Вложения