Re: Unicode string literals versus the world
От | Tom Lane |
---|---|
Тема | Re: Unicode string literals versus the world |
Дата | |
Msg-id | 17658.1239893656@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Unicode string literals versus the world (Sam Mason <sam@samason.me.uk>) |
Ответы |
Re: Unicode string literals versus the world
Re: Unicode string literals versus the world Re: Unicode string literals versus the world |
Список | pgsql-hackers |
Sam Mason <sam@samason.me.uk> writes: > I'd never heard of UTF-16 surrogate pairs before this discussion and > hence didn't realise that it's valid to have a surrogate pair in place > of a single code point. The docs say that <D800 DF02> corresponds to > U+10302, Python would appear to follow my intuitions in that: > ord(u'\uD800\uDF02') > results in an error instead of giving back 66306, as I'd expect. Is > this a bug in Python, my understanding, or something else? I might be wrong, but I think surrogate pairs are expressly forbidden in all representations other than UTF16/UCS2. We definitely forbid them when validating UTF-8 strings --- that's per an RFC recommendation. It sounds like Python is doing the same. regards, tom lane
В списке pgsql-hackers по дате отправления: