Обсуждение: Reserved words and delimited identifiers

Поиск
Список
Период
Сортировка

Reserved words and delimited identifiers

От
Joe Abbate
Дата:
Hi,

A few months ago, I got an email related to Pyrseas
(http://lists.pgfoundry.org/pipermail/pyrseas-general/2011-August/000003.html)
where the user reported he had a table named "user" and reported a
failure in the dbtoyaml utility.  I eventually implemented a simple
quote_id function (only checks if an identifier needs quoting based on
the characters in it, not on whether it's a keyword or not).  I left
adding a list of keywords as a "to do."  Filip, the user, recommended
that Pyrseas "quote all identifiers sent to database."

I was reluctant to go with the latter solution, but after two other
issues (Tryton database had a column named "limit" and another user a
table named "order"), I started making the changes.  However, I've found
some PostgreSQL behavior that is inconsistent, at the very least with
respect to the documentation.  It's probably best shown by example
(tested under 8.4 and 9.1):

pyrseas_testdb=# CREATE FUNCTION "f1"("integer", "integer") RETURNS
"integer" LANGUAGE "sql" AS $_$SELECT GREATEST($1, $2)$_$;
ERROR:  type integer does not exist
pyrseas_testdb=# CREATE TABLE "t1" ("c1" "integer", "c2" "text");
ERROR:  type "integer" does not exist
LINE 1: CREATE TABLE "t1" ("c1" "integer", "c2" "text");                               ^
pyrseas_testdb=# CREATE TABLE "t1" ("c1" integer, "c2" "text");
CREATE TABLE
pyrseas_testdb=# CREATE DOMAIN "d1" AS integer;
CREATE DOMAIN
pyrseas_testdb=# CREATE TABLE "t2" ("c1" "d1", "c2" "text");
CREATE TABLE
pyrseas_testdb=# CREATE TABLE "t3" ("c1" "int4", "c2" "text");
CREATE TABLE
pyrseas_testdb=# CREATE TABLE "t4" ("c1" "int", "c2" "text");
ERROR:  type "int" does not exist
LINE 1: CREATE TABLE "t4" ("c1" "int", "c2" "text");                               ^
pyrseas_testdb=# CREATE TABLE "t5" ("c1" "INTEGER", "c2" "text");
ERROR:  type "INTEGER" does not exist
LINE 1: CREATE TABLE "t5" ("c1" "INTEGER", "c2" "text");                               ^
pyrseas_testdb=# CREATE DOMAIN "float" AS real;
CREATE DOMAIN
pyrseas_testdb=# CREATE TABLE "t6" ("c1" "float", "c2" "text");
CREATE TABLE
pyrseas_testdb=# DROP TABLE "t6";
pyrseas_testdb=# DROP DOMAIN "float";
DROP DOMAIN
pyrseas_testdb=# CREATE TABLE "t6" ("c1" "float", "c2" "text");
ERROR:  type "float" does not exist
LINE 1: CREATE TABLE "t6" ("c1" "float", "c2" "text");                               ^

The last part is a killer.  If "float" is a domain, then it can be
quoted, otherwise it can't.  The documentation appears to contradict
this.
http://www.postgresql.org/docs/current/static/sql-keywords-appendix.html
says FLOAT is not reserved and cannot be a function or type.  I tried
defining a "float" function and it was accepted:

pyrseas_testdb=# CREATE FUNCTION "float"(integer) returns real language
sql as $_$select $1::real$_$;
CREATE FUNCTION

It seems to me that since a TYPE in a column definition or function
argument can be a non-native TYPE, it could be a reserved word and
therefore it should always be allowable to quote the TYPE.  Can someone
please explain why that is not the case?

Joe


Re: Reserved words and delimited identifiers

От
Robert Haas
Дата:
On Tue, Nov 29, 2011 at 7:29 PM, Joe Abbate <jma@freedomcircle.com> wrote:
> It seems to me that since a TYPE in a column definition or function
> argument can be a non-native TYPE, it could be a reserved word and
> therefore it should always be allowable to quote the TYPE.  Can someone
> please explain why that is not the case?

Type names as they appear in pg_type.typname can always be quoted.
But some types, like int4, have alternate names - e.g. int4 can be
specified as integer or int, and foat8 can be specified using the
two-word phrase double precision.  These alternate names are keywords
when unquoted, but identifiers (with a different meaning) when quoted.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Reserved words and delimited identifiers

От
Joe Abbate
Дата:
On 11/29/2011 10:09 PM, Robert Haas wrote:
> On Tue, Nov 29, 2011 at 7:29 PM, Joe Abbate <jma@freedomcircle.com> wrote:
>> It seems to me that since a TYPE in a column definition or function
>> argument can be a non-native TYPE, it could be a reserved word and
>> therefore it should always be allowable to quote the TYPE.  Can someone
>> please explain why that is not the case?
> 
> Type names as they appear in pg_type.typname can always be quoted.
> But some types, like int4, have alternate names - e.g. int4 can be
> specified as integer or int, and foat8 can be specified using the
> two-word phrase double precision.  These alternate names are keywords
> when unquoted, but identifiers (with a different meaning) when quoted.

What I don't understand is the following:

pyrseas_testdb=# create table "integer" (c1 integer); drop table "integer";
CREATE TABLE
DROP TABLE
pyrseas_testdb=# create table "integer" ("integer" integer); drop table
"integer";
CREATE TABLE
DROP TABLE
pyrseas_testdb=# create table "integer" ("integer" "integer");
ERROR:  type "integer" does not exist
LINE 1: create table "integer" ("integer" "integer");                                         ^
pyrseas_testdb=# create table "integer" ("integer" "INTEGER");
ERROR:  type "INTEGER" does not exist
LINE 1: create table "integer" ("integer" "INTEGER");                                         ^

Why does it allow quoting of "integer" as the table name and the column
name, but not as the type name?  Furthermore,

pyrseas_testdb=# create domain "integer" as "int4";
CREATE DOMAIN
pyrseas_testdb=# create table t1 ("integer" "integer");
CREATE TABLE
pyrseas_testdb=# create domain "INTEGER" as int4;
CREATE DOMAIN
pyrseas_testdb=# create table t2 ("integer" "INTEGER");
CREATE TABLE

Now that I created a DOMAIN/TYPE named "integer" or "INTEGER"
(contradicting the SQL Key Words table), it does allow quoting.  This
behavior appears inconsistent with the general practice that allows
quoting of type names.  In other words, why am I allowed to quote
"int4", but not "integer" or "INTEGER" (as a type name?


Joe


Re: Reserved words and delimited identifiers

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Nov 29, 2011 at 7:29 PM, Joe Abbate <jma@freedomcircle.com> wrote:
>> It seems to me that since a TYPE in a column definition or function
>> argument can be a non-native TYPE, it could be a reserved word and
>> therefore it should always be allowable to quote the TYPE.  Can someone
>> please explain why that is not the case?

> Type names as they appear in pg_type.typname can always be quoted.
> But some types, like int4, have alternate names - e.g. int4 can be
> specified as integer or int, and foat8 can be specified using the
> two-word phrase double precision.  These alternate names are keywords
> when unquoted, but identifiers (with a different meaning) when quoted.

Another way to say that is that the type int4 can be specified in two
ways:
int4        (an identifier)INTEGER        (a keyword)

Quoting "int4" is no problem, because it's still an identifier, but
quoting "integer" takes away its keyword nature, so it doesn't get
recognized.

The fact that INTEGER is a keyword, and not an identifier, is per SQL
specification.  We could perhaps hack things so that "integer" as an
identifier would also work, but I doubt that we'd go so far as to make
"INTEGER" (a different identifier) also work, and we'd certainly not
be able to do much about the spec's more exciting deviations from
identifier-looking type names, such as CHARACTER VARYING or DOUBLE
PRECISION.

Or to put it more pithily: the SQL committee's ideas of good syntax
seem to have been frozen around the time COBOL was invented.
        regards, tom lane


Re: Reserved words and delimited identifiers

От
Robert Haas
Дата:
On Tue, Nov 29, 2011 at 11:21 PM, Joe Abbate <jma@freedomcircle.com> wrote:
> Why does it allow quoting of "integer" as the table name and the column
> name, but not as the type name?  Furthermore,

Because there's nothing called "integer" in the pg_type catalog.  It's
not really a type name; as Tom says, it's some random key word
invented by the SQL committee that we map to a type name inside the
parser somewhere.  On the other hand, the table and column names have
the opposite problem: you CAN'T use random keywords there; you can
ONLY use identifiers.  So in one case you must quote because you need
to have an identifier rather than a keyword, whereas in the other case
you must not quote because you need to have a keyword rather than an
identifier.

> pyrseas_testdb=# create domain "integer" as "int4";
> CREATE DOMAIN
> pyrseas_testdb=# create table t1 ("integer" "integer");
> CREATE TABLE
> pyrseas_testdb=# create domain "INTEGER" as int4;
> CREATE DOMAIN
> pyrseas_testdb=# create table t2 ("integer" "INTEGER");
> CREATE TABLE
>
> Now that I created a DOMAIN/TYPE named "integer" or "INTEGER"
> (contradicting the SQL Key Words table), it does allow quoting.

What else would you expect?  It would be extremely strange if you
created a type called "integer" and could not then define a column of
type "integer".  The key point is that what "integer" means has
nothing to do with what the unquoted identifier INTEGER means; you
could make "integer" be a domain over text if you wanted.  You're not
making it allow quoting; you're defining a completely new data type
that happens to have a name very similar to an existing keyword (but
when quoted, it's not a keyword, of course).  On a similar note, it's
not contradicting the SQL key words table, because once you quote it,
*it's no longer a keyword*.  The whole point of quoting identifiers is
that it allows you to use names that would ordinarily be keywords as
non-keywords.

> This
> behavior appears inconsistent with the general practice that allows
> quoting of type names.  In other words, why am I allowed to quote
> "int4", but not "integer" or "INTEGER" (as a type name?

There is no such general practice.  In general, if something is an
identifier, you can quote it.  If it's a keyword and you want to make
it an identifier, you can quote it to force that interpretation.
Table and column names must be identifiers, so they can always be
quoted; if the identifiers happen to be confusable with reserved
keywords then you *must* quote them to avoid having them interpreted
as keywords rather than identifiers.  However, types can be referred
to using either a keyword, or an identifier.  Those that are
identifiers can be quoted; those that are keywords cannot.

I will cheerfully admit that this is confusing and inconvenient, and I
didn't understand it either until I implemented pg_dump
--quote-all-identifiers.  However, I'm not sure there's any easy way
to improve the situation.  For example, what are we to do with
character varying?  That can't very well be treated as an identifier,
because it's got a space in the middle.  And even if we could hack
around that, it's no good to treat it as "character varying" anyway,
because then you'd have "character varying" != "varchar", and we
certainly don't want those to be different types.  What the current
code does is map character varying to varchar under the hood, and then
sometimes map it back on output.  This does break things for client
code that just wants to quote everything in the world (and you're not
the first person to run up against this problem; I seem to recall
noticing some sketchy-looking code in pgAgent or pgAdmin the last time
I looked...) but don't see any realistic alternative that's less evil
so I think we're stuck with it...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Reserved words and delimited identifiers

От
Joe Abbate
Дата:
On 11/29/2011 11:41 PM, Tom Lane wrote:
> Another way to say that is that the type int4 can be specified in two
> ways:
> 
>     int4        (an identifier)
>     INTEGER        (a keyword)
> 
> Quoting "int4" is no problem, because it's still an identifier, but
> quoting "integer" takes away its keyword nature, so it doesn't get
> recognized.
> 
> The fact that INTEGER is a keyword, and not an identifier, is per SQL
> specification.  We could perhaps hack things so that "integer" as an
> identifier would also work, but I doubt that we'd go so far as to make
> "INTEGER" (a different identifier) also work, and we'd certainly not
> be able to do much about the spec's more exciting deviations from
> identifier-looking type names, such as CHARACTER VARYING or DOUBLE
> PRECISION.
> 
> Or to put it more pithily: the SQL committee's ideas of good syntax
> seem to have been frozen around the time COBOL was invented.

Thanks Tom and Robert.  I think I understand the problem now.  I guess
I'll have to work around this "quirk" by dealing specially with type
names and not quote them when they're in the shorter list of SQL
Standard reserved words.

Joe


Re: Reserved words and delimited identifiers

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I will cheerfully admit that this is confusing and inconvenient, and I
> didn't understand it either until I implemented pg_dump
> --quote-all-identifiers.  However, I'm not sure there's any easy way
> to improve the situation.

... especially without breaking compatibility with the SQL standard,
which explicitly requires that INTEGER be a keyword.  We've been around
on this before, many years ago.  By my reading of the spec, accepting
"integer" (with the quotes) as a name for the standard integer type
would actually be contrary to spec, or at least an extension in exactly
the same way that "int4" is an extension.  We could doubtless do it,
but the project gets less rewarding the more you think about the other
keyword type names required by spec.
        regards, tom lane


Re: Reserved words and delimited identifiers

От
Alvaro Herrera
Дата:
Excerpts from Joe Abbate's message of mié nov 30 02:15:09 -0300 2011:

> Thanks Tom and Robert.  I think I understand the problem now.  I guess
> I'll have to work around this "quirk" by dealing specially with type
> names and not quote them when they're in the shorter list of SQL
> Standard reserved words.

I wonder if it would simpler to just not quote type names except when
absolutely necessary.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Reserved words and delimited identifiers

От
Andrew Dunstan
Дата:

On 11/30/2011 09:02 AM, Alvaro Herrera wrote:
> Excerpts from Joe Abbate's message of mié nov 30 02:15:09 -0300 2011:
>
>> Thanks Tom and Robert.  I think I understand the problem now.  I guess
>> I'll have to work around this "quirk" by dealing specially with type
>> names and not quote them when they're in the shorter list of SQL
>> Standard reserved words.
> I wonder if it would simpler to just not quote type names except when
> absolutely necessary.
>

Yeah, and very much less ugly. Ploughing through masses of unnecessary
quotes is they way to a headache. quote_ident() gets this right.

cheers

andrew


Re: Reserved words and delimited identifiers

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 11/30/2011 09:02 AM, Alvaro Herrera wrote:
>> I wonder if it would simpler to just not quote type names except when
>> absolutely necessary.

> Yeah, and very much less ugly. Ploughing through masses of unnecessary 
> quotes is they way to a headache. quote_ident() gets this right.

But, per this discussion, you can't just blindly apply quote_ident to a
type name, because it may not *be* an identifier.

One possible solution, if you're getting type information about columns
from the server, is to cast the type OID to regtype and let the regtype
output converter make all the decisions.  It's less notation than a join
to pg_type anyway.
        regards, tom lane


Re: Reserved words and delimited identifiers

От
Joe Abbate
Дата:
On 11/30/2011 09:55 AM, Tom Lane wrote:
> One possible solution, if you're getting type information about columns
> from the server, is to cast the type OID to regtype and let the regtype
> output converter make all the decisions.  It's less notation than a join
> to pg_type anyway.

Unfortunately, Pyrseas' yamltodb gets (some) type information from a
YAML input file, so we can't do that.  However, since user defined TYPEs
are processed before TABLE definitions, we could search for TYPEs in the
parallel catalogs (Python dictionaries) maintained in memory.  So, given
this:

schema public: description: standard public schema table myuser:   columns:   - info:       type: user type user:
attributes:  - name: text   - pass: text
 

we could generate the following SQL against an empty database:

CREATE TYPE "user" AS (name text,   pass text);
CREATE TABLE myuser (   info "user");

Joe


Re: Reserved words and delimited identifiers

От
"Kevin Grittner"
Дата:
Joe Abbate <jma@freedomcircle.com> wrote:
> On 11/30/2011 09:55 AM, Tom Lane wrote:
>> One possible solution, if you're getting type information about
>> columns from the server, is to cast the type OID to regtype and
>> let the regtype output converter make all the decisions.  It's
>> less notation than a join to pg_type anyway.
> 
> Unfortunately, Pyrseas' yamltodb gets (some) type information from
> a YAML input file, so we can't do that.  However, since user
> defined TYPEs are processed before TABLE definitions, we could
> search for TYPEs in the parallel catalogs (Python dictionaries)
> maintained in memory.  So, given this:
> 
> schema public:
>   description: standard public schema
>   table myuser:
>     columns:
>     - info:
>         type: user
>   type user:
>     attributes:
>     - name: text
>     - pass: text
> 
> we could generate the following SQL against an empty database:
> 
> CREATE TYPE "user" AS (name text,
>     pass text);
> CREATE TABLE myuser (
>     info "user");
You are prepared to handle the difference between char and "char", I
hope.
-Kevin


Re: Reserved words and delimited identifiers

От
Joe Abbate
Дата:
On 11/30/2011 11:26 AM, Kevin Grittner wrote:
> You are prepared to handle the difference between char and "char", I
> hope.

We have not implemented a type "verifier" in Pyrseas.  It currently
generates SQL based on the type given in the input.  In normal usage,
dbtoyaml is expected to be invoked first, and it will generate quoted
types if necessary, e.g.,

schema public: description: standard public schema table myuser:   columns:   - info:       type: '"user"'   - active:
    type: '"char"'   - logons:       type: integer type user:   attributes:   - name: text   - pass: text
 

The quotes above are because it selects format_type(atttypid, atttypmod)
from pg_attribute.  The YAML output can then be fed into yamltodb and
will generate (assuming the "user" type and the first column of myuser
already exist):

ALTER TABLE myuser   ADD COLUMN active "char";
ALTER TABLE myuser   ADD COLUMN logons integer;

In other words, Pyrseas depends on the ultimate type verifier: the
PostgreSQL parser (and related routines).

Joe


Re: Reserved words and delimited identifiers

От
"Kevin Grittner"
Дата:
Joe Abbate <jma@freedomcircle.com> wrote:
> On 11/30/2011 11:26 AM, Kevin Grittner wrote:
>> You are prepared to handle the difference between char and
>> "char", I hope.
> Pyrseas depends on the ultimate type verifier: the
> PostgreSQL parser (and related routines).
OK.  I just wanted to be sure that you were aware of that one; it
surprises people sometimes that PostgreSQL includes both a char
reserved word for a type and a "char" type which is completely
different:
test=# create table x (noq char, withq "char");
CREATE TABLE
test=# \x on
Expanded display is on.
test=# select attnum, attname, atttypid, atttypid::regtype,      (select typname from pg_type where oid = atttypid),
 attlen, atttypmod, attbyval, attstorage, attalign from pg_attribute where attrelid = 'x'::regclass and attnum > 0;
 
-[ RECORD 1 ]---------
attnum     | 1
attname    | noq
atttypid   | 1042
atttypid   | character
typname    | bpchar
attlen     | -1
atttypmod  | 5
attbyval   | f
attstorage | x
attalign   | i
-[ RECORD 2 ]---------
attnum     | 2
attname    | withq
atttypid   | 18
atttypid   | "char"
typname    | char
attlen     | 1
atttypmod  | -1
attbyval   | t
attstorage | p
attalign   | c
-Kevin