Re: Bug with UTF-8 character
От | Tom Lane |
---|---|
Тема | Re: Bug with UTF-8 character |
Дата | |
Msg-id | 25791.1148654039@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Bug with UTF-8 character (Hans-Jürgen Schönig <postgres@cybertec.at>) |
Список | pgsql-hackers |
Hans-Jürgen Schönig <postgres@cybertec.at> writes: > But the code does a check where the second character should not be > greater than 0x9F, when first character is 0xED. This is not according > to UTF-8 standard in RFC 3629. Better read the RFC again: it says UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail) ------------ The reason for the prohibition is explained as The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use withthe UTF-16 encoding form (as surrogate pairs) and do not directly represent characters. I don't know anything about "surrogate pairs", but I am not about to decide that we know more about this than the RFC authors do. If they say it's invalid, it's invalid. regards, tom lane
В списке pgsql-hackers по дате отправления: