Re: [v9.2] make_greater_string() does not return a string in some cases
От | Robert Haas |
---|---|
Тема | Re: [v9.2] make_greater_string() does not return a string in some cases |
Дата | |
Msg-id | CA+TgmoYETjFMP2hFzWwCxEi2OQKA+NP5CY-DMPnasxNCgX+2rg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [v9.2] make_greater_string() does not return a string in some cases (Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp>) |
Ответы |
Re: [v9.2] make_greater_string() does not return a string in some cases
|
Список | pgsql-hackers |
On Wed, Oct 12, 2011 at 11:45 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp> wrote: > Hello, the work is finished. > > Version 4 of the patch is attached to this message. I went through this in a bit more detail tonight and am cleaning it up. But I'm a bit confused, looking at pg_utf8_increment() in detail: - Why does the second byte need special handling for 0xED and 0xF4? AFAICT, UTF-8 requires all legal strings to have a second byte between 0x80 and 0xBF, just as in byte positions 3 and 4, so these bytes would be invalid in this position anyway. - In the first byte, we don't increment if the current value for that byte is 0x7F, 0xDF, 0xEF, or 0xF4. But why isn't it 0xF7 rather than 0xF4? I see there's a comparable restriction in pg_utf8_islegal(), but I don't understand why. - Perhaps on the same point, the comments claim that we will fail for code points U+0007F, U+007FF, U+0FFFF, and U+10FFFF. But IIUC, a 4-byte unicode character can encode values up to U+1FFFFF, so why is it U+10FFFF rather than U+1FFFFF? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: