Re: Unicode string literals versus the world

Поиск

Список

Период

Сортировка

От	Marko Kreen
Тема	Re: Unicode string literals versus the world
Дата	16 апреля 2009 г. 12:50:40
Msg-id	e51f66da0904160850p36636d7dja68e6280d77f00f1@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Unicode string literals versus the world (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

On 4/16/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sam Mason <sam@samason.me.uk> writes:
>  > I'd never heard of UTF-16 surrogate pairs before this discussion and
>  > hence didn't realise that it's valid to have a surrogate pair in place
>  > of a single code point.  The docs say that <D800 DF02> corresponds to
>  > U+10302, Python would appear to follow my intuitions in that:
>
>  >   ord(u'\uD800\uDF02')
>
>  > results in an error instead of giving back 66306, as I'd expect.  Is
>  > this a bug in Python, my understanding, or something else?
>
>
> I might be wrong, but I think surrogate pairs are expressly forbidden in
>  all representations other than UTF16/UCS2.  We definitely forbid them
>  when validating UTF-8 strings --- that's per an RFC recommendation.
>  It sounds like Python is doing the same.

The point here is that Python/Java/C# allow them for escaping non-BMP
unicode values, irrespective of their interal encoding.

-- 
marko

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Unicode string literals versus the world