Re: [rfc] unicode escapes for extended strings

Поиск

Список

Период

Сортировка

От	Marko Kreen
Тема	Re: [rfc] unicode escapes for extended strings
Дата	16 апреля 2009 г. 16:32:20
Msg-id	e51f66da0904161232k7f287f9ey751ec1c09188af8d@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [rfc] unicode escapes for extended strings (Sam Mason <sam@samason.me.uk>)
Ответы	Re: [rfc] unicode escapes for extended strings
Список	pgsql-hackers

Дерево обсуждения

On 4/16/09, Sam Mason <sam@samason.me.uk> wrote:
> On Thu, Apr 16, 2009 at 08:48:58PM +0300, Marko Kreen wrote:
>  > Seems I'm bad at communicating in english,
>
>
> I hope you're not saying this because of my misunderstandings!
>
>
>  > so here is C variant of
>  > my proposal to bring \u escaping into extended strings.  Reasons:
>  >
>  > - More people are familiar with \u escaping, as it's standard
>  >   in Java/C#/Python, probably more..
>  > - U& strings will not work when stdstr=off.
>  >
>  > Syntax:
>  >
>  >   \uXXXX      - 16-bit value
>  >   \UXXXXXXXX  - 32-bit value
>  >
>  > Additionally, both \u and \U can be used to specify UTF-16 surrogate
>  > pairs to encode characters with value > 0xFFFF.  This is exact behaviour
>  > used by Java/C#/Python.  (except that Java does not have \U)
>
>
> Are you sure that this handling of surrogates is correct?  The best
>  answer I've managed to find on the Unicode consortium's site is:
>
>   http://unicode.org/faq/utf_bom.html#utf16-7
>
>  it says:
>
>   They are invalid in interchange, but may be freely used internal to an
>   implementation.
>
>  I think this means they consider the handling of them you noted above,
>  in other languages, to be an error.

It's up to UTF8 validator whether to consider non-characters as error.

-- 
marko

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [rfc] unicode escapes for extended strings