Re: UTF16 surrogate pairs in UTF8 encoding
От | Peter Eisentraut |
---|---|
Тема | Re: UTF16 surrogate pairs in UTF8 encoding |
Дата | |
Msg-id | 1283942118.18999.1.camel@fsopti579.F-Secure.com обсуждение исходный текст |
Ответ на | Re: UTF16 surrogate pairs in UTF8 encoding (Marko Kreen <markokr@gmail.com>) |
Ответы |
Re: UTF16 surrogate pairs in UTF8 encoding
|
Список | pgsql-hackers |
On ons, 2010-09-08 at 10:18 +0300, Marko Kreen wrote: > On 9/7/10, Peter Eisentraut <peter_e@gmx.net> wrote: > > On sön, 2010-08-22 at 15:15 -0400, Tom Lane wrote: > > > > We combine the surrogate pair components to a single code point and > > > > encode that in UTF-8. We don't encode the components separately; > > > that > > > > would be wrong. > > > > > > Oh, OK. Should the docs make that a bit clearer? > > > > > > Done. > > This is confusing: > > (When surrogate > pairs are used when the server encoding is <literal>UTF8</>, they > are first combined into a single code point that is then encoded > in UTF-8.) > > So something else happens if encoding is not UTF8? Then you can't specify surrogate pairs because they are outside of the ASCII range, per constraint mentioned earlier in the paragraph. > I think this part can be simply removed, it does not add anything. > > Or say that surrogate pairs are only allowed in UTF8 encoding. > Reason is that you cannot encode 0..7F codepoints with them, > and only those are allowed to be given numerically. But this is > already mentioned before. Well, Tom wanted an additional explanation. I personally agree with you; this is not the place to explain encoding and Unicode internals, when really the code only does what it's supposed to.
В списке pgsql-hackers по дате отправления: