Re: [v9.2] make_greater_string() does not return a string in some cases
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: [v9.2] make_greater_string() does not return a string in some cases |
Дата | |
Msg-id | 20111021.103646.221883029.horiguchi.kyotaro@oss.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: [v9.2] make_greater_string() does not return a string in some cases (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [v9.2] make_greater_string() does not return a string
in some cases
|
Список | pgsql-hackers |
Hello, > > Robert Haas <robertmhaas@gmail.com> writes: > >> - Why does the second byte need special handling for 0xED and 0xF4? > > > > http://www.faqs.org/rfcs/rfc3629.html > > > > See section 4 in particular. The underlying requirement is to disallow > > multiple representations of the same Unicode code point. The special handling skips the utf8 code regions corresponds to the regions U+D800 - U+DFFF and U+110000 - U+11ffff in ucs-4. The former is reserved for use with the UTF-16 encoding form as surrougate pairs and do not directly represent characters as described in section 3 of rfc3629. The latter is the region which is out of the utf-8 range by the definition described also in the same section. former> The definition of UTF-8 prohibits encoding character former> numbers between U+D800 and U+DFFF, which are reserved for former> use with the UTF-16 encoding form (as surrogate pairs) former> and do not directly represent characters. latter> In UTF-8, characters from the U+0000..U+10FFFF range (the latter> UTF-16 accessible range) are encoded using sequences of 1 latter> to 4 octets. # However, I wrote this exception simplly mimicked the # pg_utf8_validator()'s behavior at the beginning. This must be the basis of the behavior of pg_utf8_verifier(), and pg_utf8_increment() has taken over it. It may be good to describe this origin of the special handling as comment of these functions to avoid this sort of confusion. > I'm still confused. The input string is already known to be valid > UTF-8, so the second byte (if there is one) must be between 0x80 and > 0xBF. Therefore it will be neither 0xED nor 0xF4. -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: