Обсуждение: tighten generic_option_name, or store more carefully in catalog?

Поиск
Список
Период
Сортировка

tighten generic_option_name, or store more carefully in catalog?

От
Chapman Flack
Дата:
Hi,

generic_option_name is a ColLabel, therefore a fully general SQL identifier.

But a command like CREATE FOREIGN DATA WRAPPER w ... OPTIONS ("a=b" 'c=d')
stores {a=b=c=d} in fdwoptions, from which the original intent can't be
recovered.

Should generic_option_name be restricted to be a regular identifier,
or allowed to be a delimited identifier but with = forbidden within it,
or should it be represented as delimited in the catalog when necessary
so it can be recovered faithfully?

SQL rules would also make its case-sensitivity dependent on faithfully
recovering whether it was delimited or not.

Regards,
-Chap



Re: tighten generic_option_name, or store more carefully in catalog?

От
Tom Lane
Дата:
Chapman Flack <jcflack@acm.org> writes:
> generic_option_name is a ColLabel, therefore a fully general SQL identifier.

> But a command like CREATE FOREIGN DATA WRAPPER w ... OPTIONS ("a=b" 'c=d')
> stores {a=b=c=d} in fdwoptions, from which the original intent can't be
> recovered.

Ugh.

> Should generic_option_name be restricted to be a regular identifier,
> or allowed to be a delimited identifier but with = forbidden within it,
> or should it be represented as delimited in the catalog when necessary
> so it can be recovered faithfully?

I think I'd vote for leaving the grammar alone and rejecting '='
in the option-storing code.  If memory serves, there's precedent
for that approach somewhere else in our code.

> SQL rules would also make its case-sensitivity dependent on faithfully
> recovering whether it was delimited or not.

I'm not following that part?

            regards, tom lane



Re: tighten generic_option_name, or store more carefully in catalog?

От
Chapman Flack
Дата:
On 05/31/25 19:55, Tom Lane wrote:
> Chapman Flack <jcflack@acm.org> writes:
>> SQL rules would also make its case-sensitivity dependent on faithfully
>> recovering whether it was delimited or not.
> 
> I'm not following that part?

It's that ISO rule that identifiers match case-insensitively
if they weren't quoted, case-sensitively if they were.

We don't explicitly store in catalogs whether quoting was used.
Clearly it must have been, if the stored form has any non-lowercased letter
in it, or any non-regular-identifier character. But if we just have foo
in the catalog, it could have been either of foo or "foo".

Which makes me a little sad because if I provide a way for someone to, say,
check FDW options in Java, and they spell an option FOO in their code,
I would ideally like to report OPTIONS (foo 'bar') as matching and
OPTIONS("foo" 'bar') as not matching. But the information I could use
to do that just isn't there.[0]

Since writing that email, I've stumbled onto more things that seem a bit
bizarre about the FDW extension API, and written about them here -> [1].
(I can't make an internal link anchor there that works, but it's a short
scroll down to the "Notes on the validator function" heading.)

I was surprised that FDW validators weren't designed the same way as
PL validators (just provisionally put the object in the catalog,
then say "validate this oid", and if the validator throws an error
the object goes away). That would have avoided all the funky shortcomings
I just wrote about.

Obviously it's too late to change the existing FDW validator API and break
everything that uses it. But I wonder if it would be completely crazy to
just offer a new one: say if your CREATE FOREIGN DATA WRAPPER names a
validator function with signature (oid classid, oid objid) then that's
also an acceptable validator function and will be called with classid,
objid of the provisionally-entered {wrapper,server,table,attribute,
user mapping} to be validated.

Regards,
-Chap

[0] It's possible I'm hyperattentive to that sort of thing because
    IIRC there was an old PL/Java CVE where we wouldn't let a non-superuser
    change the Java class path on the public schema, but if she spelled
    it puBlIc it was ok. (The function got the schema name as text.)

[1] https://github.com/tada/pljava/issues/430#issuecomment-2921101005



Re: tighten generic_option_name, or store more carefully in catalog?

От
Tom Lane
Дата:
Chapman Flack <jcflack@acm.org> writes:
> On 05/31/25 19:55, Tom Lane wrote:
>> I'm not following that part?

> It's that ISO rule that identifiers match case-insensitively
> if they weren't quoted, case-sensitively if they were.

> We don't explicitly store in catalogs whether quoting was used.
> Clearly it must have been, if the stored form has any non-lowercased letter
> in it, or any non-regular-identifier character. But if we just have foo
> in the catalog, it could have been either of foo or "foo".

I think you're overthinking it.  There is one round of dequoting
and downcasing in the grammar, and after that a name is just a
literal string.  It's perfectly okay to just store that string and
do exact comparisons to it against other names (which themselves
must have gotten through the grammar).  IOW, your claim that there
is a semantic difference between foo and "foo" seems wrong to me.
They are equivalent identifiers.

BTW, I dug around and found that transformRelOptions probably
needs a similar defense.  For example:

regression=# create tablespace foo location '/home/postgres/tmp/foo' with ("a=b" = "c=d");
ERROR:  unrecognized parameter "a"

It looks like all callers of transformRelOptions immediately
validate the result, so that this case is only a question of
which error message you prefer to get.  But I think I'd prefer
something saying that "=" is not allowed in an option name.

> Obviously it's too late to change the existing FDW validator API and break
> everything that uses it. But I wonder if it would be completely crazy to
> just offer a new one: say if your CREATE FOREIGN DATA WRAPPER names a
> validator function with signature (oid classid, oid objid) then that's
> also an acceptable validator function and will be called with classid,
> objid of the provisionally-entered {wrapper,server,table,attribute,
> user mapping} to be validated.

The problem for FDWs is that it may not be okay to find out whether
options are valid at the instant they're entered --- you might not
be able to contact the remote server, for instance.

            regards, tom lane



Re: tighten generic_option_name, or store more carefully in catalog?

От
Chapman Flack
Дата:
On 05/31/25 21:12, Tom Lane wrote:
> I think you're overthinking it.  There is one round of dequoting
> and downcasing in the grammar, and after that a name is just a
> literal string.  It's perfectly okay to just store that string and
> do exact comparisons to it against other names (which themselves
> must have gotten through the grammar).          ^^^^^ ^^^^^^^^^^
  ^^^^ ^^^^ ^^^^^^ ^^^^^^^ ^^^ ^^^^^^^

Right ... until you're in a situation of comparing it to an identifier
that came to you some other way. The old bug where PL/Java would let
non-superusers change the public-schema class path by spelling public
in different case was possible because the set_classpath function takes
the schema as a string argument. Developers of FDW handlers are likely
to have option names embedded in some kind of PL source code. Those don't
all arrive preprocessed in the same way as those PostgreSQL parsed.

> IOW, your claim that there is a semantic difference between foo
> and "foo" seems wrong to me. They are equivalent identifiers.

I think it is true that I can treat them as equivalent for matching:
it falls in the rule in the spec that says a <regular identifier> and
a <delimited identifier> are equivalent if the case-normal form of the
regular one matches the exact form of the delimited one.

So yes, I did overthink it a tad: if I see foo in the catalog and Foo
in some function argument or PL source code, I do need to know whether
the argument or source literal was intended to represent a regular
or a delimited identifier, but I don't need to know that about the foo
in the catalog. I knew that once and wrote code that knew it, but forgot
for a moment here.

> The problem for FDWs is that it may not be okay to find out whether
> options are valid at the instant they're entered --- you might not
> be able to contact the remote server, for instance.

Hmm, ok, so is the argument that the provisionally-enter-in-catalog
validation approach wasn't used because a validity verdict might not
be forthcoming right away in the transaction where that is done, and
that this alternate approach of passing some partial information to
a validator in an array solved a problem that would otherwise arise?

I don't think I'm seeing how the one approach ends up better than the
other on that score. If I write test code, it sure seems the validator
as currently implemented gets called at the instant I do the CREATE,
and I'm not sure what else I could imagine happening instead.

Seems like "some options might need server connection to fully check"
is just an inherent, understood limitation no matter how the validator
API is designed. A validator author surely thinks about what checks are
basic sanity checks on syntax and compatibility of different options and
what can be checked locally and what requires a server connection, and
for what options it may be necessary to check the basic syntax and then
say "ok" and hope for the best. Or take the more pessimistic approach and
just reject the declaration and make you try again when the server's up.

But to have to tell a validator author things like, you can't check your
locally-declared table options against what your locally-declared server
options are ... you can't check your column options against your declared
table options, or even what table the column is declared in, or what the
column's locally-declared type is ... your server option validation can't
know anything about the FDW options and the server 'type' and 'version'
aren't even passed to the validator ... just because our API's that way ..,

... things like that seem more like own-goals. Fixing them wouldn't make
the truly inherent limitations go away, but then at least there'd just be
the truly inherent ones.

Regards,
-Chap



Re: tighten generic_option_name, or store more carefully in catalog?

От
Tom Lane
Дата:
Here's a proposed patch for the "=" issue.  Whether or not we should
rethink FDW validation behavior, doing so surely couldn't be
back-patched.  But I think this much should be.

            regards, tom lane

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 46c1dce222d..50747c16396 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1243,8 +1243,9 @@ transformRelOptions(Datum oldOptions, List *defList, const char *namspace,
         }
         else
         {
-            text       *t;
+            const char *name;
             const char *value;
+            text       *t;
             Size        len;

             /*
@@ -1291,11 +1292,19 @@ transformRelOptions(Datum oldOptions, List *defList, const char *namspace,
              * have just "name", assume "name=true" is meant.  Note: the
              * namespace is not output.
              */
+            name = def->defname;
             if (def->arg != NULL)
                 value = defGetString(def);
             else
                 value = "true";

+            /* Insist that name not contain "=", else "a=b=c" is ambiguous */
+            if (strchr(name, '=') != NULL)
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("invalid option name \"%s\": must not contain \"=\"",
+                                name)));
+
             /*
              * This is not a great place for this test, but there's no other
              * convenient place to filter the option out. As WITH (oids =
@@ -1303,7 +1312,7 @@ transformRelOptions(Datum oldOptions, List *defList, const char *namspace,
              * amount of ugly.
              */
             if (acceptOidsOff && def->defnamespace == NULL &&
-                strcmp(def->defname, "oids") == 0)
+                strcmp(name, "oids") == 0)
             {
                 if (defGetBoolean(def))
                     ereport(ERROR,
@@ -1313,11 +1322,11 @@ transformRelOptions(Datum oldOptions, List *defList, const char *namspace,
                 continue;
             }

-            len = VARHDRSZ + strlen(def->defname) + 1 + strlen(value);
+            len = VARHDRSZ + strlen(name) + 1 + strlen(value);
             /* +1 leaves room for sprintf's trailing null */
             t = (text *) palloc(len + 1);
             SET_VARSIZE(t, len);
-            sprintf(VARDATA(t), "%s=%s", def->defname, value);
+            sprintf(VARDATA(t), "%s=%s", name, value);

             astate = accumArrayResult(astate, PointerGetDatum(t),
                                       false, TEXTOID,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c14e038d54f..8d2d7431544 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -71,15 +71,26 @@ optionListToArray(List *options)
     foreach(cell, options)
     {
         DefElem    *def = lfirst(cell);
+        const char *name;
         const char *value;
         Size        len;
         text       *t;

+        name = def->defname;
         value = defGetString(def);
-        len = VARHDRSZ + strlen(def->defname) + 1 + strlen(value);
+
+        /* Insist that name not contain "=", else "a=b=c" is ambiguous */
+        if (strchr(name, '=') != NULL)
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("invalid option name \"%s\": must not contain \"=\"",
+                            name)));
+
+        len = VARHDRSZ + strlen(name) + 1 + strlen(value);
+        /* +1 leaves room for sprintf's trailing null */
         t = palloc(len + 1);
         SET_VARSIZE(t, len);
-        sprintf(VARDATA(t), "%s=%s", def->defname, value);
+        sprintf(VARDATA(t), "%s=%s", name, value);

         astate = accumArrayResult(astate, PointerGetDatum(t),
                                   false, TEXTOID,

Re: tighten generic_option_name, or store more carefully in catalog?

От
Chapman Flack
Дата:
On 06/02/25 14:13, Tom Lane wrote:
> Here's a proposed patch for the "=" issue.  Whether or not we should
> rethink FDW validation behavior, doing so surely couldn't be
> back-patched.  But I think this much should be.

LGTM
Regards,
-Chap




Re: tighten generic_option_name, or store more carefully in catalog?

От
Tom Lane
Дата:
Chapman Flack <jcflack@acm.org> writes:
> On 06/02/25 14:13, Tom Lane wrote:
>> Here's a proposed patch for the "=" issue.  Whether or not we should
>> rethink FDW validation behavior, doing so surely couldn't be
>> back-patched.  But I think this much should be.

> LGTM

Pushed, thanks for looking at it.

            regards, tom lane