Re: Inaccurate documentation about identifiers
От | raf |
---|---|
Тема | Re: Inaccurate documentation about identifiers |
Дата | |
Msg-id | Y3a6BMoEzbcZ0rEy@raf.org обсуждение исходный текст |
Ответ на | Re: Inaccurate documentation about identifiers (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Inaccurate documentation about identifiers
|
Список | pgsql-bugs |
On Thu, Nov 17, 2022 at 03:01:10PM -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > On Wed, 2022-11-16 at 08:36 -0500, Brennan Vincent wrote: > >> However, it seems that all non-ASCII characters are considered > >> "letters" > > > You're correct: it seems to allow any byte with the high bit set; > > including, for example, a zero-width space. > > Yes, see scan.l: > > ident_start [A-Za-z\200-\377_] > ident_cont [A-Za-z\200-\377_0-9\$] > > identifier {ident_start}{ident_cont}* > > > I don't think we want to change the documentation here, because that > > would amount to a promise that we support such identifiers forever. > > I also don't think we want to change the code, because it opens up > > several problems and I'm not sure it's worth trying to solve them. > > Right. IIRC, the SQL spec would have us allow only things that actually > are letters per Unicode or other relevant spec, but (1) that's rather > encoding-dependent and (2) the hit to parsing speed would likely be > non-negligible. Still, we might do it someday if someone can find > a way around those concerns. (Accepting whitespace, in particular, > is Not Great.) I think benign neglect in the docs is the best path. > > regards, tom lane I think a lot of programming languages probably only use ASCII for operators and whitespace. I have a domain specific micro language that explicitly treats all 8-bit bytes as "letters" when parsing the names of things as a cheap way to "support" ASCII-compatible encodings like UTF-8 and ISO-8859-* (but it's useless for UTF-16, GB 18030, Big5, ...). The only way to do it right would be to decode everything. But then you'd probably lose the ability to include emojis in identifiers. I wonder if anyone's doing that in postgresql. :-) Does the SQL spec require accepting *only* real letters as letters, or does it require accepting *at least* real letters as letters. :-) Just a bit of wishful thinking. cheers, raf
В списке pgsql-bugs по дате отправления: