Re: BUG #15273: Lexer bug with UESCAPE
От | Andrew Gierth |
---|---|
Тема | Re: BUG #15273: Lexer bug with UESCAPE |
Дата | |
Msg-id | 87bmbekq90.fsf@news-spur.riddles.org.uk обсуждение исходный текст |
Ответ на | Re: BUG #15273: Lexer bug with UESCAPE (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: BUG #15273: Lexer bug with UESCAPE
|
Список | pgsql-bugs |
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Also, I'm going to push back on the claim that allowing comments Tom> there is required by the SQL spec. The relevant rules in SQL:2011 Tom> are Tom> <Unicode character string literal> ::= Tom> [ <introducer> <character set specification> ] Tom> U <ampersand> <quote> [ <Unicode representation>... ] <quote> Tom> [ { <separator> <quote> [ <Unicode representation>... ] <quote> }... ] Tom> <Unicode escape specifier> Tom> <Unicode escape specifier> ::= Tom> [ UESCAPE <quote> <Unicode escape character> <quote> ] Tom> I do not see any principled way of arguing that these rules Tom> require comments to be allowed adjacent to UESCAPE without also Tom> claiming that they must be allowed between, say, the initial 'U' Tom> and the ampersand. These are the rules that (as far as I can see) apply to that case: 5.2 <token> and <separator> <separator> ::= { <comment> | <white space> }... 7) Any <token> may be followed by a <separator>. 5.3 <literal> 11) In a <Unicode character string literal>, there shall be no <separator> between the "U" and the <ampersand> nor between the <ampersand> and the <quote>. Tom> The only place these rules allow a <separator> is between segments Tom> of a multiline literal. It looks to me like an extension that we Tom> even allow whitespace around UESCAPE. I think that that use of <separator> is only to indicate that a <separator> there is _required_, rather than optional as it usually is after tokens, and that the special rule about requiring newlines also applies only to that specific use of <separator>. If the whole <Unicode character string literal> is regarded as being a single token, and therefore rule 5.2.7 above didn't apply around the UESCAPE, then there would be no reason to write rule 5.3.11 forbidding separators within the U&' part. (In the case of X'...', there's rule 5.2.5, which as I see it would prevent a space after the X, but that rule explicitly does not apply to the U& cases.) As a related issue, we don't allow comments within the <separator> that splits a multiline literal, even though the spec certainly allows those (arguably, since the spec defines that comments are equivalent to newlines, "select 'foo' /**/ 'bar';" should be legal too). I've put up a summary of all these at https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard#Lexing_of_string_literals_and_comments (under the assumption that the whole issue is filed under WONTFIX at least for the time being) -- Andrew (irc:RhodiumToad)
В списке pgsql-bugs по дате отправления: