Обсуждение: Why don't we support external input/output functions for the composite types

Поиск
Список
Период
Сортировка

Why don't we support external input/output functions for the composite types

От
Dilip Kumar
Дата:
Hi,

I'm curious about composite types in PostgreSQL. By default, when we
create a composite type, it utilizes the "record_in" and "record_out"
functions for input/output. Do you think it would be beneficial to
expand the syntax to allow users to specify custom input/output
functions when creating composite types? Has anyone attempted this
before, and are there any design challenges associated with it? Or is
it simply not implemented because it's not seen as a valuable
addition?

I believe it would be beneficial because users creating a new type
might prefer to define specific input/output syntax rather than
conforming to what is accepted by the RECORD type.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Dilip Kumar <dilipbalaut@gmail.com> writes:
> I'm curious about composite types in PostgreSQL. By default, when we
> create a composite type, it utilizes the "record_in" and "record_out"
> functions for input/output. Do you think it would be beneficial to
> expand the syntax to allow users to specify custom input/output
> functions when creating composite types?

No.

> I believe it would be beneficial because users creating a new type
> might prefer to define specific input/output syntax rather than
> conforming to what is accepted by the RECORD type.

The primary outcome would be to require a huge amount of new work
to be done by a lot of software, much of it not under our control.
And the impact wouldn't only be to software that would prefer not
to know about this.  For example, how likely do you think it is
that these hypothetical user-defined I/O functions would cope
well with ALTER TABLE/ALTER TYPE commands that change those
rowtypes?

            regards, tom lane



Re: Why don't we support external input/output functions for the composite types

От
Dilip Kumar
Дата:
On Thu, Apr 25, 2024 at 10:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dilip Kumar <dilipbalaut@gmail.com> writes:
> > I'm curious about composite types in PostgreSQL. By default, when we
> > create a composite type, it utilizes the "record_in" and "record_out"
> > functions for input/output. Do you think it would be beneficial to
> > expand the syntax to allow users to specify custom input/output
> > functions when creating composite types?
>
> No.
>
> > I believe it would be beneficial because users creating a new type
> > might prefer to define specific input/output syntax rather than
> > conforming to what is accepted by the RECORD type.
>

Thanks for the quick response, Tom.

> The primary outcome would be to require a huge amount of new work
> to be done by a lot of software, much of it not under our control.

Yeah, I agree with that.

> And the impact wouldn't only be to software that would prefer not
> to know about this.  For example, how likely do you think it is
> that these hypothetical user-defined I/O functions would cope
> well with ALTER TABLE/ALTER TYPE commands that change those
> rowtypes?

That's a good point. I was primarily focused on altering the
representation of input and output values, rather than considering
changes to internal storage. However, offering this feature could
indeed allow users to influence how values are stored.  And that can
potentially affect ALTER TYPE because then we do not have control over
how those values are stored internally.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Why don't we support external input/output functions for the composite types

От
Robert Haas
Дата:
This thread caught my eye this morning, and I'm confused.

On Thu, Apr 25, 2024 at 12:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The primary outcome would be to require a huge amount of new work
> to be done by a lot of software, much of it not under our control.

What, what is the "lot of software" that would have to be changed? It
can't be existing extensions, because they wouldn't be forced into
using this feature. Are you thinking that drivers or admin tools would
need to support it? To me it seems like the only required changes
would be to things that know how to parse the output of record_out(),
and there is probably some of that, but the language you're using here
is so emphatic as to make me suspect that you anticipate some larger
impact.

> And the impact wouldn't only be to software that would prefer not
> to know about this.  For example, how likely do you think it is
> that these hypothetical user-defined I/O functions would cope
> well with ALTER TABLE/ALTER TYPE commands that change those
> rowtypes?

Hmm. Dilip mentioned changing the storage format, but if you do that,
you have bigger problems, like my_record_type.x no longer working. At
that point I don't see why what you have is properly called a record
type at all. So I guess what you're imagining here is that ALTER TABLE
.. ALTER TYPE would try COERCION_PATH_COERCEVIAIO, but, uh so what? We
could probably fix it so that such coercions were handled in some
other way, but even if we didn't, it just means the user has to
provide a USING clause, which is no different than what happens in any
other case where coerce-via-I/O doesn't work out.

--
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> This thread caught my eye this morning, and I'm confused.
> On Thu, Apr 25, 2024 at 12:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The primary outcome would be to require a huge amount of new work
>> to be done by a lot of software, much of it not under our control.

> What, what is the "lot of software" that would have to be changed?

I think this potentially affects stuff as low-level as drivers,
particularly those that deal in binary not text I/O.  It's not
unreasonable for client code to assume that any type with
typtype 'c' (composite) will adhere to the specifications at
https://www.postgresql.org/docs/current/rowtypes.html#ROWTYPES-IO-SYNTAX
especially because that section pretty much says in so many words
that that's the case.

> It
> can't be existing extensions, because they wouldn't be forced into
> using this feature. Are you thinking that drivers or admin tools would
> need to support it?

Yes.  We've heard that argument about "this only affects extensions
that choose to use it" before, and it's nonsense.  As soon as you
extend system-wide APIs, the consequences are system-wide: everybody
has to cope with the new definition.

>> For example, how likely do you think it is
>> that these hypothetical user-defined I/O functions would cope
>> well with ALTER TABLE/ALTER TYPE commands that change those
>> rowtypes?

> Hmm. Dilip mentioned changing the storage format, but if you do that,
> you have bigger problems, like my_record_type.x no longer working. At
> that point I don't see why what you have is properly called a record
> type at all.

Yup, I agree.

> So I guess what you're imagining here is that ALTER TABLE
> .. ALTER TYPE would try COERCION_PATH_COERCEVIAIO, but, uh so what?

Uh, no.  My point is that if you make a custom output function
for "type complex (real float8, imaginary float8)", that function
will probably crash pretty hard if what it's handed is something
other than two float8s.  But there is nothing to stop somebody
from trying to ALTER the type to be two numerics or whatever.
Conversely, the type's custom input function would likely keep on
producing two float8s, yielding corrupt data so far as the rest
of the system is concerned.

You could imagine preventing such trouble by forbidding ALTER TYPE
on types with custom I/O functions.  But that just makes it even
more obvious that what this is is a poorly-thought-through hack,
rather than a feature that interoperates well with the rest of
Postgres.

I think that to the extent that there's a need for custom I/O
of something like this, it should be handled by bespoke types,
similar to (say) type point.

            regards, tom lane



Re: Why don't we support external input/output functions for the composite types

От
Robert Haas
Дата:
On Thu, Apr 25, 2024 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yes.  We've heard that argument about "this only affects extensions
> that choose to use it" before, and it's nonsense.  As soon as you
> extend system-wide APIs, the consequences are system-wide: everybody
> has to cope with the new definition.

Sure. Any new feature has this problem to some extent.

> Uh, no.  My point is that if you make a custom output function
> for "type complex (real float8, imaginary float8)", that function
> will probably crash pretty hard if what it's handed is something
> other than two float8s.  But there is nothing to stop somebody
> from trying to ALTER the type to be two numerics or whatever.
> Conversely, the type's custom input function would likely keep on
> producing two float8s, yielding corrupt data so far as the rest
> of the system is concerned.

I'm not sure I really buy this. Changing the column definitions
amounts to changing the on-disk format, and no data type can survive a
change to the on-disk format without updating the I/O functions to
match.

> I think that to the extent that there's a need for custom I/O
> of something like this, it should be handled by bespoke types,
> similar to (say) type point.

I somewhat agree with this. The main disadvantage of that approach is
that you lose the ability to directly refer to the members, which in
some cases would be quite nice. I bet a lot of people would enjoy
being able to write my_point.x and my_point.y instead of my_point[0]
and my_point[1], for example. But maybe the solution to that is not
$SUBJECT.

A related problem is that, even if my_point behaved like a composite
type, you'd have to write (my_point).x and (my_point).y to avoid
something like:

ERROR:  missing FROM-clause entry for table "my_point"

I think it's confusing and counterintuitive that putting parentheses
around a subexpression completely changes the meaning. I don't know of
any other programming language that behaves that way, and I find the
way the "indirection" productions are coded in gram.y to be highly
questionable. I suspect everything we currently treat as an
indirection_el should instead be a way of constructing a new a_expr or
c_expr or something like that, but I strongly suspect if I try to make
the work I'll discover horrible problems I can't fix. Still, it's
awful.

--
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 25, 2024 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Uh, no.  My point is that if you make a custom output function
>> for "type complex (real float8, imaginary float8)", that function
>> will probably crash pretty hard if what it's handed is something
>> other than two float8s.

> I'm not sure I really buy this. Changing the column definitions
> amounts to changing the on-disk format, and no data type can survive a
> change to the on-disk format without updating the I/O functions to
> match.

What I'm trying to say is: given that the command "alter type T alter
attribute A type foo" exists, users would reasonably expect the system
to honor that on its own for any composite type, because that's what
it does today.  But it can't if T has custom I/O functions, at least
not without understanding how to rewrite those functions.

>> I think that to the extent that there's a need for custom I/O
>> of something like this, it should be handled by bespoke types,
>> similar to (say) type point.

> I somewhat agree with this. The main disadvantage of that approach is
> that you lose the ability to directly refer to the members, which in
> some cases would be quite nice. I bet a lot of people would enjoy
> being able to write my_point.x and my_point.y instead of my_point[0]
> and my_point[1], for example. But maybe the solution to that is not
> $SUBJECT.

Nope, it isn't IMO.  We already added infrastructure to allow
arbitrary custom types to define subscripting operations.  I think a
case could be made to allow them to define field selection, too.

> I think it's confusing and counterintuitive that putting parentheses
> around a subexpression completely changes the meaning. I don't know of
> any other programming language that behaves that way,

I take it that you also don't believe that "2 + 3 * 4" should yield
different results from "(2 + 3) * 4"?

I could get behind offering an alternative notation, eg "a.b->c does
the same thing as (a.b).c", if we could find a reasonable notation
that doesn't infringe on user operator namespace.  I think that might
be hard to do though, and I don't think the existing notation is so
awful than we should break existing operators to have an alternative.
The business with deprecating => operators a few years ago had the
excuse that "the standard says so", but we don't have that
justification here.


            regards, tom lane



Re: Why don't we support external input/output functions for the composite types

От
Isaac Morland
Дата:
On Thu, 25 Apr 2024 at 17:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
 
> I think it's confusing and counterintuitive that putting parentheses
> around a subexpression completely changes the meaning. I don't know of
> any other programming language that behaves that way,

I take it that you also don't believe that "2 + 3 * 4" should yield
different results from "(2 + 3) * 4"?

In that expression "2 + 3" is not a subexpression, although "3 * 4" is, thanks to the operator precedence rules.

I could get behind offering an alternative notation, eg "a.b->c does
the same thing as (a.b).c", if we could find a reasonable notation
that doesn't infringe on user operator namespace.  I think that might
be hard to do though, and I don't think the existing notation is so
awful than we should break existing operators to have an alternative.
The business with deprecating => operators a few years ago had the
excuse that "the standard says so", but we don't have that
justification here.

This is one of those areas where it will be difficult at best to do something which makes things work the way people expect without breaking other cases. I certainly would like to be able to use . to extract a field from a composite value without parenthesizing everything, but given the potential for having a schema name that matches a table or field name I would want to be very careful about changing anything.

I wrote:
> I could get behind offering an alternative notation, eg "a.b->c does
> the same thing as (a.b).c", if we could find a reasonable notation
> that doesn't infringe on user operator namespace.  I think that might
> be hard to do though, and I don't think the existing notation is so
> awful than we should break existing operators to have an alternative.
> The business with deprecating => operators a few years ago had the
> excuse that "the standard says so", but we don't have that
> justification here.

A different approach we could take is to implement the SQL99 rules
for <identifier chain>, or at least move closer to that.  Our
existing rules for resolving qualified column references are more
or less SQL92.  I think the reasons we didn't do that when we first
implemented SQL99 are

(1) The SQL99 rules are fundamentally ambiguous, which they wave
away by saying that it's user error if there's more than one way
to interpret the reference.  This approach is decidedly not nice,
notably because it means that unrelated-looking changes in your
schema can break your query.  Having to check multiple
interpretations slows parsing, too.

(2) Switching from SQL92 to SQL99 rules would break some queries
anyway.  (At least, that's my recollection, though looking at
the specs right now I don't see any case where SQL99 doesn't
take a SQL92 alternative, so long as you don't run into (1).)

Still, maybe it's time to think about changing?  We could use
the "the standard says so" excuse with anybody who complains.

In the long run I wish we could ditch the SQL92 rules altogether
and say that the head identifier of a qualified column reference
must be a table's correlation name, not a schema or catalog name.
There's zero good reason for the latter two cases, other than
compatibility with thirty-year-old design mistakes.  I kind of
doubt we could make that fly though.

            regards, tom lane



Re: Why don't we support external input/output functions for the composite types

От
Robert Haas
Дата:
On Thu, Apr 25, 2024 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I'm not sure I really buy this. Changing the column definitions
> > amounts to changing the on-disk format, and no data type can survive a
> > change to the on-disk format without updating the I/O functions to
> > match.
>
> What I'm trying to say is: given that the command "alter type T alter
> attribute A type foo" exists, users would reasonably expect the system
> to honor that on its own for any composite type, because that's what
> it does today.  But it can't if T has custom I/O functions, at least
> not without understanding how to rewrite those functions.

I understand your point, but I don't agree with it. Ordinary users
wouldn't be able to create types like this anyway, because there's no
way we can allow an unprivileged user to set an input or output
function. It would have to be restricted to superusers, just as we do
for base types. And IMHO those have basically the same issue: you have
to ensure that all the functions and operators that operate on the
type, and any subscripting operations, are on the same page about what
the underlying storage is. This doesn't seem different. It may well
still be a bad idea for other reasons, or just kind of useless, but I
disagree that it's a bad idea for that particular reason.

> Nope, it isn't IMO.  We already added infrastructure to allow
> arbitrary custom types to define subscripting operations.  I think a
> case could be made to allow them to define field selection, too.

That would be cool!

> I take it that you also don't believe that "2 + 3 * 4" should yield
> different results from "(2 + 3) * 4"?

Isaac's rebuttal to this particular point was perfect; I have nothing to add.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Why don't we support external input/output functions for the composite types

От
Robert Haas
Дата:
On Thu, Apr 25, 2024 at 5:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A different approach we could take is to implement the SQL99 rules
> for <identifier chain>, or at least move closer to that.  Our
> existing rules for resolving qualified column references are more
> or less SQL92.  I think the reasons we didn't do that when we first
> implemented SQL99 are

I'm not familiar with these rules. Do they allow stuff like a.b.c.d.e,
or better yet, a.b(args).c(args).d(args).e(args)?

> Still, maybe it's time to think about changing?  We could use
> the "the standard says so" excuse with anybody who complains.

I certainly agree that if we're going to break stuff, breaking stuff
to get closer to the standard is superior to other ways of breaking
stuff. Without knowing what we'd get out of it, I don't have an
opinion about whether it's worth it here or not, but making our syntax
more like other programming languages and especially other popular
database products does seem to me to have positive value.

> In the long run I wish we could ditch the SQL92 rules altogether
> and say that the head identifier of a qualified column reference
> must be a table's correlation name, not a schema or catalog name.
> There's zero good reason for the latter two cases, other than
> compatibility with thirty-year-old design mistakes.  I kind of
> doubt we could make that fly though.

Yeah, I think that would break too much stuff.

--
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 25, 2024 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What I'm trying to say is: given that the command "alter type T alter
>> attribute A type foo" exists, users would reasonably expect the system
>> to honor that on its own for any composite type, because that's what
>> it does today.  But it can't if T has custom I/O functions, at least
>> not without understanding how to rewrite those functions.

> I understand your point, but I don't agree with it. Ordinary users
> wouldn't be able to create types like this anyway, because there's no
> way we can allow an unprivileged user to set an input or output
> function. It would have to be restricted to superusers, just as we do
> for base types.

Well, that would be one way of making the consistency problem be not
our problem, but it would be a sad restriction.  It'd void a lot of
the arguable use-case for this feature, if you ask me.  I realize
that non-superusers couldn't create the C-language I/O functions that
would be most at risk here, but you could imagine people building
I/O functions in some other PL.

(We'd have to remove the restriction that cstring isn't an allowed
input or return type for user-defined functions; but AFAIK that's
just a holdover from days when cstring was a lot more magic than
it is now.)

Maybe there's an argument that PL functions already have to be
proof enough against datatype inconsistencies that nothing really
awful could happen.  Not sure.

In any case, if we have to put strange restrictions on a composite
type when it has custom I/O functions, then that still is an
indication that the feature is a hack that doesn't play nice
with the rest of the system.  So I remain of the opinion that
we shouldn't go there.  If field selection support for custom
types will solve the use-case, I find that a lot more attractive.

>> I take it that you also don't believe that "2 + 3 * 4" should yield
>> different results from "(2 + 3) * 4"?

> Isaac's rebuttal to this particular point was perfect; I have nothing to add.

As far as I could tell, Isaac's rebuttal was completely off-point.

            regards, tom lane



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 25, 2024 at 5:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A different approach we could take is to implement the SQL99 rules
>> for <identifier chain>, or at least move closer to that.

> I'm not familiar with these rules. Do they allow stuff like a.b.c.d.e,
> or better yet, a.b(args).c(args).d(args).e(args)?

The former.

         <identifier chain> ::=
              <identifier> [ { <period> <identifier> }... ]

The hard part is to figure out what the first identifier is:
column name? table correlation name (AS name)? schema name or
catalog name of a qualified table name? function parameter name?
After that, as long as what you have is of composite type,
you can drill down into it.

If I'm reading SQL99 correctly, they deny allowing the first
identifier to be a column name when there's more than one identifier,
so that you must table-qualify a composite column before you can
select a field from it.  But they allow all the other possibilities
and claim it's user error if more than one could apply, which seems
like an awful design to me.  At minimum I'd want to say that the
correlation name should be the first choice and wins if there's
a match, regardless of anything else, because otherwise there is
no safe way for ruleutils to deparse such a construct.  And
probably function parameter name should be second choice and
similarly win over other choices, for the same reason.  The other
options are SQL92 compatibility holdovers and should only be
considered if we can't find a matching correlation or parameter name.

The net result of doing it like this, I think, is that we'd accept
some cases where SQL99 would prefer to raise an ambiguity error.
But we'd still be much closer to the modern standard than we are
today.

            regards, tom lane



Re: Why don't we support external input/output functions for the composite types

От
Robert Haas
Дата:
On Fri, Apr 26, 2024 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, that would be one way of making the consistency problem be not
> our problem, but it would be a sad restriction.  It'd void a lot of
> the arguable use-case for this feature, if you ask me.  I realize
> that non-superusers couldn't create the C-language I/O functions that
> would be most at risk here, but you could imagine people building
> I/O functions in some other PL.

Huh, I hadn't considered that. I figured the performance would be too
bad to even think about it. I also wasn't even sure such a thing would
be supportable: I thought cstrings were generally limited to
C/internal functions.

> >> I take it that you also don't believe that "2 + 3 * 4" should yield
> >> different results from "(2 + 3) * 4"?
>
> > Isaac's rebuttal to this particular point was perfect; I have nothing to add.
>
> As far as I could tell, Isaac's rebuttal was completely off-point.

OK, I'm not sure why, but let me explain my position. In an expression
like (2 + 3) * 4, the parentheses change the order of evaluation,
which makes sense. That's what parentheses are for, or at least one
thing that parentheses are for. But in an expression like
(my_point).x, that's not the case. There's only one operator here, the
period, and so there's only one possible order of evaluation, so why
do parentheses make any difference? Having (my_point).x be different
from my_point.x is like having 2 + 3 give a different answer from (2 +
3), which would be bonkers.

But it's not at all like the difference between 2 + 3 * 4 and (2 + 3)
* 4. The comparable case there would be foo.bar.baz as against
(foo.bar).baz or alternatively foo.(bar.baz). Now there are two dot
operators, and one of them has to be applied first, and there's some
default based on associativity, and if you want it the other way you
stick parentheses in there to tell the parser what you meant.

And the reason I thought Isaac put it well is that he said, "In that
expression 2 + 3 is not a subexpression, although 3 * 4 is, thanks to
the operator precedence rules." Exactly so. 2 + 3 * 4 is going to be
parsed as something like OpExpr(+, 2, OpExpr(*, 3, 4)) -- and that
does not have OpExpr(+, 2, 3) anywhere inside of it, so my comment
that parenthesizing a subexpression shouldn't change its meaning is
not relevant here. I'm perfectly fine with parentheses changing which
things we parse as subexpressions. Users have no license to just stick
parentheses into your SQL expressions in random places and expect that
they don't do anything; if that were so, we'd have to make ((2 +)
3)()()() evaluate to 5, which is obviously nonsense. Rather, what I
don't like about the status quo is that putting parentheses around
something that we were already going to consider as a unit changes the
meaning of it. And that's exactly what happens when you write x.y vs.
(x).y. The parentheses around the x make us think that it's a
different kind of thing, somehow. That seems odd, and the practical
result is that you have to insert a bunch of parentheses into your
PostgreSQL code that look like they shouldn't be needed, but are.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Why don't we support external input/output functions for the composite types

От
Robert Haas
Дата:
On Fri, Apr 26, 2024 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I'm not familiar with these rules. Do they allow stuff like a.b.c.d.e,
> > or better yet, a.b(args).c(args).d(args).e(args)?
>
> The former.
>
>          <identifier chain> ::=
>               <identifier> [ { <period> <identifier> }... ]

OK, nice.

> The hard part is to figure out what the first identifier is:
> column name? table correlation name (AS name)? schema name or
> catalog name of a qualified table name? function parameter name?
> After that, as long as what you have is of composite type,
> you can drill down into it.

Right, makes sense.

> If I'm reading SQL99 correctly, they deny allowing the first
> identifier to be a column name when there's more than one identifier,
> so that you must table-qualify a composite column before you can
> select a field from it.  But they allow all the other possibilities
> and claim it's user error if more than one could apply, which seems
> like an awful design to me.  At minimum I'd want to say that the
> correlation name should be the first choice and wins if there's
> a match, regardless of anything else, because otherwise there is
> no safe way for ruleutils to deparse such a construct.  And
> probably function parameter name should be second choice and
> similarly win over other choices, for the same reason.  The other
> options are SQL92 compatibility holdovers and should only be
> considered if we can't find a matching correlation or parameter name.

I definitely agree that there must always be some way to make it
unambiguous, not just because of deparsing but also because users are
going to want a way to force their preferred interpretation. I've been
a PostgreSQL developer now for considerably longer than I was an end
user, but I definitely would not have liked "ERROR: you can't get
there from here".

I'm less certain how that should be spelled. The rules you propose
make sense to me up to a point, but what happens if the same
unqualified name is both a table alias and a function parameter name?
I think I need a way of forcing the function-parameter interpretation.
You could make function_name.parameter_name resolve to that, but then
what happens if function_name is also a table alias in the containing
query? It's really hard to think of a set of rules here that don't
leave any room for unfixable problems. Maybe the answer is that we
should support some completely different notion for unambiguously
referencing parameters, like ${parameter_name}. I don't know. I think
that what you're proposing here could be a nice improvement but it
definitely seems tricky to get it completely right.

--
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 26, 2024 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, that would be one way of making the consistency problem be not
>> our problem, but it would be a sad restriction.  It'd void a lot of
>> the arguable use-case for this feature, if you ask me.  I realize
>> that non-superusers couldn't create the C-language I/O functions that
>> would be most at risk here, but you could imagine people building
>> I/O functions in some other PL.

> Huh, I hadn't considered that. I figured the performance would be too
> bad to even think about it. I also wasn't even sure such a thing would
> be supportable: I thought cstrings were generally limited to
> C/internal functions.

Performance could indeed be an issue, but I think people taking this
path would be doing so because they value programmer time more than
machine time.  And while there once were good reasons to not let
user functions deal in cstrings, I'm not sure there are anymore.
(The point would deserve closer investigation if we actually tried
to move forward on it, of course.)

> Rather, what I
> don't like about the status quo is that putting parentheses around
> something that we were already going to consider as a unit changes the
> meaning of it. And that's exactly what happens when you write x.y vs.
> (x).y.

But that's exactly the point: we DON'T consider the initial identifier
of a qualified name "as a unit", and neither does the standard.
We have to figure out how many of the identifiers name an object
(column or table) referenced in the query, and that is not clear
up-front.  SQL99's rules don't make that any better.  The parens in
our current notation serve to separate the object name from any field
selection(s) done on the object.  There's still some ambiguity,
because we allow you to write either "(table.column).field" or
"(table).column.field", but we've dealt with that for ages.

            regards, tom lane



Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 26, 2024 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If I'm reading SQL99 correctly, they deny allowing the first
>> identifier to be a column name when there's more than one identifier,
>> so that you must table-qualify a composite column before you can
>> select a field from it.  But they allow all the other possibilities
>> and claim it's user error if more than one could apply, which seems
>> like an awful design to me.

> I'm less certain how that should be spelled. The rules you propose
> make sense to me up to a point, but what happens if the same
> unqualified name is both a table alias and a function parameter name?
> I think I need a way of forcing the function-parameter interpretation.
> You could make function_name.parameter_name resolve to that, but then
> what happens if function_name is also a table alias in the containing
> query? It's really hard to think of a set of rules here that don't
> leave any room for unfixable problems. Maybe the answer is that we
> should support some completely different notion for unambiguously
> referencing parameters, like ${parameter_name}. I don't know. I think
> that what you're proposing here could be a nice improvement but it
> definitely seems tricky to get it completely right.

I think you're moving the goal posts too far.  It's on the user to
spell the initially-written query unambiguously: if you chose a
function parameter name that matches a table correlation name in the
query, that's your fault and you'd better rename one of those things.
What concerns me is the hazard that the query is okay, and we store
it, and then subsequent object creations or renamings create a
situation where an identifier chain is ambiguous per the SQL99 rules.
ruleutils has to be able to deparse the stored query in a way that is
valid regardless of that.  Giving first priority to correlation and
parameter names makes this possible because external operations, even
including renaming tables or columns used in the query, won't affect
either.

            regards, tom lane

PS: ruleutils does sometimes choose new correlation names, and it
suddenly occurs to me that it's probably not being careful to avoid
duplicating function parameter names.  But that's independent of this
discussion.



Re: Why don't we support external input/output functions for the composite types

От
Robert Haas
Дата:
On Fri, Apr 26, 2024 at 12:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> But that's exactly the point: we DON'T consider the initial identifier
> of a qualified name "as a unit", and neither does the standard.
> We have to figure out how many of the identifiers name an object
> (column or table) referenced in the query, and that is not clear
> up-front.  SQL99's rules don't make that any better.  The parens in
> our current notation serve to separate the object name from any field
> selection(s) done on the object.  There's still some ambiguity,
> because we allow you to write either "(table.column).field" or
> "(table).column.field", but we've dealt with that for ages.

I agree that this is exactly the point.

No other programming language that I know of, and no other database
that I know of, looks at x.y.z and says "ok, well first we have to
figure out whether the object is named x or x.y or x.y.z, and then
after that, we'll use whatever is left over as a field selector."
Instead, they have a top-level namespace where x refers to one and
only one thing, and then they look for something called y inside of
that, and if that's a valid object then they look inside of that for
z.

JavaScript is probably the purest example of this. Everything is an
object, and x.y just looks up 'x' in the object that is the current
namespace. Assuming that returns an object rather than nothing, we
then try to find 'y' inside of that object.

I'm not an Oracle expert, but I am under the impression that the way
that Oracle works is closer to that than it is to our
read-the-tea-leaves approach.

I'm almost positive you're about to tell me that there's no way in the
infernal regions that we could make a semantics change of this
magnitude, and maybe you're right. But I think our current approach is
deeply unsatisfying and seriously counterintuitive. People do not get
an error about x.y and think "oh, right, I need to write (x).y so that
the parser understands that the name is x rather than x.y and the .y
part is field-selection rather than a part of the name itself." They
get an error about x.y and say "crap, I guess this syntax isn't
supported" and then when you show them that "(x).y" fixes it, they say
"why in the world does that fix it?" or "wow, that's dumb."

Imagine if we made _ perform string concatenation but also continued
to allow it as an identifier character. When we saw a_b without
spaces, we'd test for whether there's an a_b variable, and/or whether
there are a and b variables, to guess which interpretation was meant.
I hope we would all agree that this would be insane language design.
Yet that's essentially what we've done with period, and I don't think
we can blame that on the SQL standard, because I don't think other
systems have this problem. I wonder if anyone knows of another system
that works like PostgreSQL in this regard (without sharing code).

--
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> No other programming language that I know of, and no other database
> that I know of, looks at x.y.z and says "ok, well first we have to
> figure out whether the object is named x or x.y or x.y.z, and then
> after that, we'll use whatever is left over as a field selector."

It may indeed be true that nobody but SQL does that, but nonetheless
this is exactly what SQL99 requires AFAICT.  The reason we use this
parenthesis notation is precisely that we didn't want to get into
that sort of tea-leaf-reading about how many identifiers mean what.
The parens put it on the user to tell us what part of the chain
is field selection.

Now do you see why I'd prefer to ditch the SQL92-compatibility
measures?  If we said that the first identifier in a chain must
be a correlation name or parameter name, never anything else,
it'd be substantially saner.

> Yet that's essentially what we've done with period, and I don't think
> we can blame that on the SQL standard

Yes, we can.  Please do not rant further about this until you've
read the <identifier chain> section of a recent SQL spec.

            regards, tom lane



Re: Why don't we support external input/output functions for the composite types

От
Isaac Morland
Дата:
On Fri, 26 Apr 2024 at 14:04, Robert Haas <robertmhaas@gmail.com> wrote:

systems have this problem. I wonder if anyone knows of another system
that works like PostgreSQL in this regard (without sharing code).

In Haskell period (".") is used both to form a qualified name (module.name), very similar to our schema.object, and it is also a perfectly normal operator which is defined by the standard prelude as function composition (but you can re-bind it to any object whatsoever). This is disambiguated in a very simple way however: Module names must begin with an uppercase letter while variable names must begin with a lowercase letter.

A related point is that parentheses in Haskell act to group expressions, but they, and commas, are not involved in calling functions: to call a function, just write it to the left of its parameter (and it only has one parameter, officially).

All this might sound weird but it actually works very well in the Haskell context.

Re: Why don't we support external input/output functions for the composite types

От
Robert Haas
Дата:
On Fri, Apr 26, 2024 at 2:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > No other programming language that I know of, and no other database
> > that I know of, looks at x.y.z and says "ok, well first we have to
> > figure out whether the object is named x or x.y or x.y.z, and then
> > after that, we'll use whatever is left over as a field selector."
>
> It may indeed be true that nobody but SQL does that, but nonetheless
> this is exactly what SQL99 requires AFAICT.  The reason we use this
> parenthesis notation is precisely that we didn't want to get into
> that sort of tea-leaf-reading about how many identifiers mean what.
> The parens put it on the user to tell us what part of the chain
> is field selection.

I really thought this was just PostgreSQL, not SQL generally, but I
just experimented a bit with Oracle on dbfiddle.uk using this example:

CREATE TYPE foo AS OBJECT (a number(10), b varchar2(2000));
CREATE TABLE bar (quux foo);
INSERT INTO bar VALUES (foo(1, 'one'));
SELECT bar.quux, quux, (quux).a, (bar.quux).a FROM bar;

This works, but if I delete the parentheses from the last line, then
it fails. So evidently my understanding of how this works in other
systems is incorrect, or incomplete. I feel like I've encountered
cases where we required extra parenthesization that Oracle didn't
need, but it's hard to discuss that without examples, and I don't have
them right now.

> Yes, we can.  Please do not rant further about this until you've
> read the <identifier chain> section of a recent SQL spec.

I'm hurt to see emails that I spent time on characterized as a rant,
even if I was wrong on the facts. And I think appealing to the SQL
standard is a poor way of trying to end debate on a topic.

--
Robert Haas
EDB: http://www.enterprisedb.com