Re: UTF16 surrogate pairs in UTF8 encoding

Поиск

Список

Период

Сортировка

От	Bruce Momjian
Тема	Re: UTF16 surrogate pairs in UTF8 encoding
Дата	19 февраля 2011 г. 20:00:57
Msg-id	201102200000.p1K00Ui04261@momjian.us обсуждение исходный текст
Ответ на	Re: UTF16 surrogate pairs in UTF8 encoding (Marko Kreen <markokr@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Marko Kreen wrote:
> On 9/8/10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Marko Kreen <markokr@gmail.com> writes:
> >  > Although it does seem unnecessary.
> >
> >
> > The reason I asked for this to be spelled out is that ordinarily,
> >  a backslash escape \nnn is a very low-level thing that will insert
> >  exactly what you say.  To me it's quite unexpected that the system
> >  would editorialize on that to the extent of replacing two UTF16
> >  surrogate characters by a single code point.  That's necessary for
> >  correctness because our underlying storage is UTF8, but it's not
> >  obvious that it will happen.  (As a counterexample, if our underlying
> >  storage were UTF16, then very different things would need to happen
> >  for the exact same SQL input.)
> >
> >  I think a lot of people will have this same question when reading
> >  this para, which is why I asked for an explanation there.
> 
> Ok, but I still don't like the "when"s.  How about:
> 
> -    6-digit form technically makes this unnecessary.  (When surrogate
> -    pairs are used when the server encoding is <literal>UTF8</>, they
> -    are first combined into a single code point that is then encoded
> -    in UTF-8.)
> +    6-digit form technically makes this unnecessary.  (Surrogate
> +    pairs are not stored directly, but combined into a single
> +    code point that is then encoded in UTF-8.)

Applied, thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: UTF16 surrogate pairs in UTF8 encoding