Обсуждение: tighten generic_option_name, or store more carefully in catalog?
Hi, generic_option_name is a ColLabel, therefore a fully general SQL identifier. But a command like CREATE FOREIGN DATA WRAPPER w ... OPTIONS ("a=b" 'c=d') stores {a=b=c=d} in fdwoptions, from which the original intent can't be recovered. Should generic_option_name be restricted to be a regular identifier, or allowed to be a delimited identifier but with = forbidden within it, or should it be represented as delimited in the catalog when necessary so it can be recovered faithfully? SQL rules would also make its case-sensitivity dependent on faithfully recovering whether it was delimited or not. Regards, -Chap
Chapman Flack <jcflack@acm.org> writes: > generic_option_name is a ColLabel, therefore a fully general SQL identifier. > But a command like CREATE FOREIGN DATA WRAPPER w ... OPTIONS ("a=b" 'c=d') > stores {a=b=c=d} in fdwoptions, from which the original intent can't be > recovered. Ugh. > Should generic_option_name be restricted to be a regular identifier, > or allowed to be a delimited identifier but with = forbidden within it, > or should it be represented as delimited in the catalog when necessary > so it can be recovered faithfully? I think I'd vote for leaving the grammar alone and rejecting '=' in the option-storing code. If memory serves, there's precedent for that approach somewhere else in our code. > SQL rules would also make its case-sensitivity dependent on faithfully > recovering whether it was delimited or not. I'm not following that part? regards, tom lane
On 05/31/25 19:55, Tom Lane wrote: > Chapman Flack <jcflack@acm.org> writes: >> SQL rules would also make its case-sensitivity dependent on faithfully >> recovering whether it was delimited or not. > > I'm not following that part? It's that ISO rule that identifiers match case-insensitively if they weren't quoted, case-sensitively if they were. We don't explicitly store in catalogs whether quoting was used. Clearly it must have been, if the stored form has any non-lowercased letter in it, or any non-regular-identifier character. But if we just have foo in the catalog, it could have been either of foo or "foo". Which makes me a little sad because if I provide a way for someone to, say, check FDW options in Java, and they spell an option FOO in their code, I would ideally like to report OPTIONS (foo 'bar') as matching and OPTIONS("foo" 'bar') as not matching. But the information I could use to do that just isn't there.[0] Since writing that email, I've stumbled onto more things that seem a bit bizarre about the FDW extension API, and written about them here -> [1]. (I can't make an internal link anchor there that works, but it's a short scroll down to the "Notes on the validator function" heading.) I was surprised that FDW validators weren't designed the same way as PL validators (just provisionally put the object in the catalog, then say "validate this oid", and if the validator throws an error the object goes away). That would have avoided all the funky shortcomings I just wrote about. Obviously it's too late to change the existing FDW validator API and break everything that uses it. But I wonder if it would be completely crazy to just offer a new one: say if your CREATE FOREIGN DATA WRAPPER names a validator function with signature (oid classid, oid objid) then that's also an acceptable validator function and will be called with classid, objid of the provisionally-entered {wrapper,server,table,attribute, user mapping} to be validated. Regards, -Chap [0] It's possible I'm hyperattentive to that sort of thing because IIRC there was an old PL/Java CVE where we wouldn't let a non-superuser change the Java class path on the public schema, but if she spelled it puBlIc it was ok. (The function got the schema name as text.) [1] https://github.com/tada/pljava/issues/430#issuecomment-2921101005
Chapman Flack <jcflack@acm.org> writes: > On 05/31/25 19:55, Tom Lane wrote: >> I'm not following that part? > It's that ISO rule that identifiers match case-insensitively > if they weren't quoted, case-sensitively if they were. > We don't explicitly store in catalogs whether quoting was used. > Clearly it must have been, if the stored form has any non-lowercased letter > in it, or any non-regular-identifier character. But if we just have foo > in the catalog, it could have been either of foo or "foo". I think you're overthinking it. There is one round of dequoting and downcasing in the grammar, and after that a name is just a literal string. It's perfectly okay to just store that string and do exact comparisons to it against other names (which themselves must have gotten through the grammar). IOW, your claim that there is a semantic difference between foo and "foo" seems wrong to me. They are equivalent identifiers. BTW, I dug around and found that transformRelOptions probably needs a similar defense. For example: regression=# create tablespace foo location '/home/postgres/tmp/foo' with ("a=b" = "c=d"); ERROR: unrecognized parameter "a" It looks like all callers of transformRelOptions immediately validate the result, so that this case is only a question of which error message you prefer to get. But I think I'd prefer something saying that "=" is not allowed in an option name. > Obviously it's too late to change the existing FDW validator API and break > everything that uses it. But I wonder if it would be completely crazy to > just offer a new one: say if your CREATE FOREIGN DATA WRAPPER names a > validator function with signature (oid classid, oid objid) then that's > also an acceptable validator function and will be called with classid, > objid of the provisionally-entered {wrapper,server,table,attribute, > user mapping} to be validated. The problem for FDWs is that it may not be okay to find out whether options are valid at the instant they're entered --- you might not be able to contact the remote server, for instance. regards, tom lane
On 05/31/25 21:12, Tom Lane wrote: > I think you're overthinking it. There is one round of dequoting > and downcasing in the grammar, and after that a name is just a > literal string. It's perfectly okay to just store that string and > do exact comparisons to it against other names (which themselves > must have gotten through the grammar). ^^^^^ ^^^^^^^^^^ ^^^^ ^^^^ ^^^^^^ ^^^^^^^ ^^^ ^^^^^^^ Right ... until you're in a situation of comparing it to an identifier that came to you some other way. The old bug where PL/Java would let non-superusers change the public-schema class path by spelling public in different case was possible because the set_classpath function takes the schema as a string argument. Developers of FDW handlers are likely to have option names embedded in some kind of PL source code. Those don't all arrive preprocessed in the same way as those PostgreSQL parsed. > IOW, your claim that there is a semantic difference between foo > and "foo" seems wrong to me. They are equivalent identifiers. I think it is true that I can treat them as equivalent for matching: it falls in the rule in the spec that says a <regular identifier> and a <delimited identifier> are equivalent if the case-normal form of the regular one matches the exact form of the delimited one. So yes, I did overthink it a tad: if I see foo in the catalog and Foo in some function argument or PL source code, I do need to know whether the argument or source literal was intended to represent a regular or a delimited identifier, but I don't need to know that about the foo in the catalog. I knew that once and wrote code that knew it, but forgot for a moment here. > The problem for FDWs is that it may not be okay to find out whether > options are valid at the instant they're entered --- you might not > be able to contact the remote server, for instance. Hmm, ok, so is the argument that the provisionally-enter-in-catalog validation approach wasn't used because a validity verdict might not be forthcoming right away in the transaction where that is done, and that this alternate approach of passing some partial information to a validator in an array solved a problem that would otherwise arise? I don't think I'm seeing how the one approach ends up better than the other on that score. If I write test code, it sure seems the validator as currently implemented gets called at the instant I do the CREATE, and I'm not sure what else I could imagine happening instead. Seems like "some options might need server connection to fully check" is just an inherent, understood limitation no matter how the validator API is designed. A validator author surely thinks about what checks are basic sanity checks on syntax and compatibility of different options and what can be checked locally and what requires a server connection, and for what options it may be necessary to check the basic syntax and then say "ok" and hope for the best. Or take the more pessimistic approach and just reject the declaration and make you try again when the server's up. But to have to tell a validator author things like, you can't check your locally-declared table options against what your locally-declared server options are ... you can't check your column options against your declared table options, or even what table the column is declared in, or what the column's locally-declared type is ... your server option validation can't know anything about the FDW options and the server 'type' and 'version' aren't even passed to the validator ... just because our API's that way .., ... things like that seem more like own-goals. Fixing them wouldn't make the truly inherent limitations go away, but then at least there'd just be the truly inherent ones. Regards, -Chap
Here's a proposed patch for the "=" issue. Whether or not we should rethink FDW validation behavior, doing so surely couldn't be back-patched. But I think this much should be. regards, tom lane diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 46c1dce222d..50747c16396 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -1243,8 +1243,9 @@ transformRelOptions(Datum oldOptions, List *defList, const char *namspace, } else { - text *t; + const char *name; const char *value; + text *t; Size len; /* @@ -1291,11 +1292,19 @@ transformRelOptions(Datum oldOptions, List *defList, const char *namspace, * have just "name", assume "name=true" is meant. Note: the * namespace is not output. */ + name = def->defname; if (def->arg != NULL) value = defGetString(def); else value = "true"; + /* Insist that name not contain "=", else "a=b=c" is ambiguous */ + if (strchr(name, '=') != NULL) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid option name \"%s\": must not contain \"=\"", + name))); + /* * This is not a great place for this test, but there's no other * convenient place to filter the option out. As WITH (oids = @@ -1303,7 +1312,7 @@ transformRelOptions(Datum oldOptions, List *defList, const char *namspace, * amount of ugly. */ if (acceptOidsOff && def->defnamespace == NULL && - strcmp(def->defname, "oids") == 0) + strcmp(name, "oids") == 0) { if (defGetBoolean(def)) ereport(ERROR, @@ -1313,11 +1322,11 @@ transformRelOptions(Datum oldOptions, List *defList, const char *namspace, continue; } - len = VARHDRSZ + strlen(def->defname) + 1 + strlen(value); + len = VARHDRSZ + strlen(name) + 1 + strlen(value); /* +1 leaves room for sprintf's trailing null */ t = (text *) palloc(len + 1); SET_VARSIZE(t, len); - sprintf(VARDATA(t), "%s=%s", def->defname, value); + sprintf(VARDATA(t), "%s=%s", name, value); astate = accumArrayResult(astate, PointerGetDatum(t), false, TEXTOID, diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c index c14e038d54f..8d2d7431544 100644 --- a/src/backend/commands/foreigncmds.c +++ b/src/backend/commands/foreigncmds.c @@ -71,15 +71,26 @@ optionListToArray(List *options) foreach(cell, options) { DefElem *def = lfirst(cell); + const char *name; const char *value; Size len; text *t; + name = def->defname; value = defGetString(def); - len = VARHDRSZ + strlen(def->defname) + 1 + strlen(value); + + /* Insist that name not contain "=", else "a=b=c" is ambiguous */ + if (strchr(name, '=') != NULL) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid option name \"%s\": must not contain \"=\"", + name))); + + len = VARHDRSZ + strlen(name) + 1 + strlen(value); + /* +1 leaves room for sprintf's trailing null */ t = palloc(len + 1); SET_VARSIZE(t, len); - sprintf(VARDATA(t), "%s=%s", def->defname, value); + sprintf(VARDATA(t), "%s=%s", name, value); astate = accumArrayResult(astate, PointerGetDatum(t), false, TEXTOID,
On 06/02/25 14:13, Tom Lane wrote: > Here's a proposed patch for the "=" issue. Whether or not we should > rethink FDW validation behavior, doing so surely couldn't be > back-patched. But I think this much should be. LGTM Regards, -Chap
Chapman Flack <jcflack@acm.org> writes: > On 06/02/25 14:13, Tom Lane wrote: >> Here's a proposed patch for the "=" issue. Whether or not we should >> rethink FDW validation behavior, doing so surely couldn't be >> back-patched. But I think this much should be. > LGTM Pushed, thanks for looking at it. regards, tom lane