Re: Bug with UTF-8 character

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Bug with UTF-8 character
Дата	26 мая 2006 г. 11:35:08
Msg-id	25791.1148654039@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Bug with UTF-8 character (Hans-Jürgen Schönig <postgres@cybertec.at>)
Список	pgsql-hackers

Дерево обсуждения

Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> But the code does a check where the second character should not be 
> greater than 0x9F, when first character is 0xED. This is not according 
> to UTF-8 standard in RFC 3629.

Better read the RFC again: it says
  UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /                %xED %x80-9F UTF8-tail / %xEE-EF 2(
UTF8-tail)                ------------
 

The reason for the prohibition is explained as
 The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use
withthe UTF-16 encoding form (as surrogate pairs) and do not directly represent characters.
 

I don't know anything about "surrogate pairs", but I am not about to
decide that we know more about this than the RFC authors do.  If they
say it's invalid, it's invalid.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Bug with UTF-8 character