Обсуждение: Proposal to adjust typmod argument on base UDT input functions

Поиск
Список
Период
Сортировка

Proposal to adjust typmod argument on base UDT input functions

От
Octavio Alvarez
Дата:
Hi, everyone,

First time posting in pgsql-hackers.

I crafted the following rough patch which passes the target column
typmod on input functions instead of -1 for OIDs >= 16384. The intention
is to affect UDTs (user-defined types) only, not core data types. It
needs adjustments but the core idea is this:

diff --git a/src/backend/parser/parse_coerce.c 
b/src/backend/parser/parse_coerce.c
index 0b5b81c7f27..b884745f7f6 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -276,7 +276,7 @@ coerce_type(ParseState *pstate, Node *node,
                  * or it won't be able to obey the bizarre SQL-spec 
input rules. (Ugly
                  * as sin, but so is this part of the spec...)
                  */
-               if (baseTypeId == INTERVALOID)
+               if (baseTypeId == INTERVALOID || baseTypeId >= 16384)
                         inputTypeMod = baseTypeMod;
                 else
                         inputTypeMod = -1;


Rationale:

While developing a base UDT with a type modifier, we noticed we always
got -1 for the target typmod (input function's third argument). We
expected to get the target column typmod instead, as per the CREATE TYPE
documentation [1].

We later learned that input values go through a two-step process [2]. If
the target column is newtype(10), PostgreSQL will first pass the value
through the newtype_in with typmod = -1, and once it's newtype(-1), it
gets casted from newtype(-1) to newtype(10) through a cast function.
This is called a "sizing cast" and ties the typmod to size / length
semantics.

For UDTs the typmod might not have size or length semantics. For
example, if the typmod specifies how to encrypt a value, the two-step
process is inconvenient. The input function should make it go directly
from cstring to newtype(final_typmod).

Basic testing on our side works as expected so far:

- The input function is called only once and the sizing cast is not
   called anymore.

- If the UDT does not have a typmod, it still gets -1.

- Sized domains still seem to work for INSERTs, like in CREATE DOMAIN
   v5char AS varchar(5); and then trying to use it.

- Inserting into a newtype(a) column from a 'someting'::newtype(b) value
   works correctly: first, newtype_in to typmod = b, then, self-cast to
   typmod = a.

- The PostgreSQL test suite still passes.

That said, we don't want to break other's use cases, so I ask for your
comments, especially if you know of problematic scenarios or develop
data types or extensions, so I can work on them.


Notes:

Given how COPY works, I'd say that most or all UDTs should support
pg_dump's default settings (COPY-based). If this is the case, most or
all UDTs should be already compatible with this patch.

Setting the target typmod for all OIDs breaks the test suite.

Why 16384? It's the minimum OID assigned after a cluster is initialized.
We relied on the documentation at [3]. I considered checking the
typcategory but some core types, like bytea, have typcategory = 'U'.

Comments will be appreciated. If you deem this is a good way to go, I
will submit it as a proper patch for review.

Thanks,
Octavio.


[1] https://www.postgresql.org/docs/17/sql-createtype.html#id-1.9.3.94.5.8

[2] https://www.postgresql.org/docs/17/typeconv-query.html#TYPECONV-QUERY

[3] 
https://www.postgresql.org/docs/17/system-catalog-initial-data.html#SYSTEM-CATALOG-OID-ASSIGNMENT
Вложения

Re: Proposal to adjust typmod argument on base UDT input functions

От
Tom Lane
Дата:
Octavio Alvarez <octalpg@alvarezp.org> writes:
> I crafted the following rough patch which passes the target column
> typmod on input functions instead of -1 for OIDs >= 16384. The intention
> is to affect UDTs (user-defined types) only, not core data types.

I don't really see how we could accept this?  Wouldn't it break
every existing extension datatype that uses typmod?

            regards, tom lane



Re: Proposal to adjust typmod argument on base UDT input functions

От
Octavio Alvarez
Дата:
On 8/7/25 22:46, Tom Lane wrote:
> Octavio Alvarez <octalpg@alvarezp.org> writes:
>> I crafted the following rough patch which passes the target column
>> typmod on input functions instead of -1 for OIDs >= 16384. The intention
>> is to affect UDTs (user-defined types) only, not core data types.
> 
> I don't really see how we could accept this?  Wouldn't it break
> every existing extension datatype that uses typmod?

That was my first thought as well, but COPY sends the typmod directly 
already, so if they support COPY, they should already be compatible.

If an extension doesn't support COPY it means then it doesn't support 
the default pg_dump settings either.

Octavio.



Re: Proposal to adjust typmod argument on base UDT input functions

От
Tom Lane
Дата:
Octavio Alvarez <octalpg@alvarezp.org> writes:
> On 8/7/25 22:46, Tom Lane wrote:
>> I don't really see how we could accept this?  Wouldn't it break
>> every existing extension datatype that uses typmod?

> That was my first thought as well, but COPY sends the typmod directly 
> already, so if they support COPY, they should already be compatible.

COPY is not the same context.

I'm not averse to doing something here, because it's certainly a mess
as mentioned by the comment right above your proposed patch.  But this
patch looks like "let's break half the universe for the benefit of the
other half".  (And, given the shortage of prior complaints, that's
being very generous about the proportion of data types that would
benefit.)

I think the way to move forward here would be to invent an explicit
datatype property that controls what to do.  I'm too tired to think
through exactly what the definition of the property would be, but
I suspect it'd have something to do with whether implicit and explicit
coercion behaviors are supposed to differ.

            regards, tom lane



Re: Proposal to adjust typmod argument on base UDT input functions

От
Sandino Araico Sánchez
Дата:
On 07/08/25 23:18, Tom Lane wrote:
That was my first thought as well, but COPY sends the typmod directly 
already, so if they support COPY, they should already be compatible.
COPY is not the same context.

The INPUT function is not context aware.

From Postgres documentation:
>  The input function can be declared as taking one argument of type cstring, or as taking three
> arguments of types cstring, oid, integer. The first argument is the input text as a C string, the
> second argument is the type's own OID (except for array types, which instead receive their element
> type's OID), and the third is the typmod of the destination column, if known (-1 will be passed if not).
https://www.postgresql.org/docs/current/sql-createtype.html

Inside the INPUT function it's not possible to identify which context it's been called from. The only available arguments are the input text as a C string, the type's OID and the typmod.

I'm not averse to doing something here, because it's certainly a mess
as mentioned by the comment right above your proposed patch.  But this
patch looks like "let's break half the universe for the benefit of the
other half". 
If the rest of the universe already knows what to do in case the third argument is -1 or the known typmod, nothing should break.
 (And, given the shortage of prior complaints, that's
being very generous about the proportion of data types that would
benefit.)

Nobody complaining doesn't mean non-existance of the problem. It might just mean they are all working around.

I have not revised many other extensions, but in postgis I can confirm INPUT function in handles the typmod correctly when available (lwgeom_inout.c lines 172 to 180).  This patch would not break postfix.
I can also confirm postgis is using the CAST workaround for the case when typmod in the INPUT function is -1.  Postgis would not need such workaround if typmod was passed correctly to the INPUT function. Postgis is loosing efficiency because of this workaround.

I can look into other data type extensions if required to.

I think the way to move forward here would be to invent an explicit
datatype property that controls what to do. 
Why not just allow the INPUT function to work as documented?
I'm too tired to think
through exactly what the definition of the property would be, but
I suspect it'd have something to do with whether implicit and explicit
coercion behaviors are supposed to differ.

INPUT function is not aware of coercion behaviors. It's just not working as documented in https://www.postgresql.org/docs/current/sql-createtype.html

The proposed patch fixes INPUT function to work as documented.

			regards, tom lane




-- 
Sandino Araico Sánchez 
http://sandino.net