Re: [rfc] unicode escapes for extended strings
От | Marko Kreen |
---|---|
Тема | Re: [rfc] unicode escapes for extended strings |
Дата | |
Msg-id | e51f66da0904161232k7f287f9ey751ec1c09188af8d@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [rfc] unicode escapes for extended strings (Sam Mason <sam@samason.me.uk>) |
Ответы |
Re: [rfc] unicode escapes for extended strings
|
Список | pgsql-hackers |
On 4/16/09, Sam Mason <sam@samason.me.uk> wrote: > On Thu, Apr 16, 2009 at 08:48:58PM +0300, Marko Kreen wrote: > > Seems I'm bad at communicating in english, > > > I hope you're not saying this because of my misunderstandings! > > > > so here is C variant of > > my proposal to bring \u escaping into extended strings. Reasons: > > > > - More people are familiar with \u escaping, as it's standard > > in Java/C#/Python, probably more.. > > - U& strings will not work when stdstr=off. > > > > Syntax: > > > > \uXXXX - 16-bit value > > \UXXXXXXXX - 32-bit value > > > > Additionally, both \u and \U can be used to specify UTF-16 surrogate > > pairs to encode characters with value > 0xFFFF. This is exact behaviour > > used by Java/C#/Python. (except that Java does not have \U) > > > Are you sure that this handling of surrogates is correct? The best > answer I've managed to find on the Unicode consortium's site is: > > http://unicode.org/faq/utf_bom.html#utf16-7 > > it says: > > They are invalid in interchange, but may be freely used internal to an > implementation. > > I think this means they consider the handling of them you noted above, > in other languages, to be an error. It's up to UTF8 validator whether to consider non-characters as error. -- marko
В списке pgsql-hackers по дате отправления: