Three-byte Unicode characters
От | Bruce Momjian |
---|---|
Тема | Three-byte Unicode characters |
Дата | |
Msg-id | 200504101351.j3ADpxX05679@candle.pha.pa.us обсуждение исходный текст |
Ответы |
Re: Three-byte Unicode characters
|
Список | pgsql-hackers |
[ This email to hackers from last night got lost so I am remailing.] Tom Lane wrote: > "John Hansen" <john@geeknet.com.au> writes: > >> That is backpatched to 8.0.X. Does that not fix the problem reported? > > > No, as andrew said, what this patch does, is allow values > 0xffff and > > at the same time validates the input to make sure it's valid utf8. > > The impression I get is that most of the 'Unicode characters above > 0x10000' reports we've seen did not come from people who actually needed > more-than-16-bit Unicode codepoints, but from people who had screwed up > their encoding settings and were trying to tell the backend that Latin1 > was Unicode or some such. So I'm a bit worried that extending the > backend support to full 32-bit Unicode will do more to mask encoding > mistakes than it will do to create needed functionality. > > Not that I'm against adding the functionality. I'm just doubtful that > the reports we've seen really indicate that we need it, or that adding > it will cut down on the incidence of complaints :-( OK, I got on the IRC server and talked to folks who actually understand this. They say there are Chinese who are reporting this problem, so I Googled and found this: http://www.yale.edu/chinesemac/pages/charset_encoding.html#Unicode See the paragraph with "Supplementary Ideographic Plane". You will see that paragraph says: The Supplementary Ideographic Plane (SIP) currently contains 42,711additional characters in "CJK Unified Ideographs ExtensionB"(U+20000-2A6D6). The PDF chart for this is available at:http://www.unicode.org/charts/PDF/U20000.pdf I assume it is that U+20000-2A6D6 range that people are complaining about. So, we do have a bug, and we are probably going to need to fix it in 8.0.X. I apologize to people who reported this problem and I wasn't attentive to the seriousness of it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
В списке pgsql-hackers по дате отправления: