Re: BUG #12845: The GB18030 encoding doesn't support Unicode characters over 0xFFFF
От | Arjen Nienhuis |
---|---|
Тема | Re: BUG #12845: The GB18030 encoding doesn't support Unicode characters over 0xFFFF |
Дата | |
Msg-id | CAG6W84JZ-ZFhAM1GQzpVUOW8YM2gx6_-f4uCKU1j2sdmt+wO6g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #12845: The GB18030 encoding doesn't support Unicode characters over 0xFFFF (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: BUG #12845: The GB18030 encoding doesn't support Unicode
characters over 0xFFFF
|
Список | pgsql-bugs |
On 10 Mar 2015 22:33, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote: > > On 03/09/2015 10:51 PM, a.g.nienhuis@gmail.com wrote: >> >> The following bug has been logged on the website: >> >> Bug reference: 12845 >> Logged by: Arjen Nienhuis >> Email address: a.g.nienhuis@gmail.com >> PostgreSQL version: 9.3.5 >> Operating system: Ubuntu Linux >> Description: >> >> Step to reproduce: >> >> In psql: >> >> arjen=> select convert_to(chr(128512), 'GB18030'); >> >> Actual output: >> >> ERROR: character with byte sequence 0xf0 0x9f 0x98 0x80 in encoding "UTF8" >> has no equivalent in encoding "GB18030" >> >> Expected output: >> >> convert_to >> ------------ >> \x9439fc36 >> (1 row) > > > Hmm, looks like our gb18030 <-> Unicode conversion table only contains the Unicode BMP plane. Unicode points above 0xffff are not included. > > If we added all the missing mappings as one to one mappings, like we've done for the BMP, that would bloat the table horribly. There are over 1 million code points that are currently not mapped. Fortunately, the missing mappings are in linear ranges that would be fairly simple to handle in programmatically. See e.g. https://ssl.icu-project.org/repos/icu/data/trunk/charset/source/gb18030/gb18030.html. Someone needs to write the code (I'm not volunteering myself). > > - Heikki I can write a "uint32 UTF8toGB18030(uint32)" function, but I don't know where to put it in the code. (Maybe at line 479 of conv.c: https://github.com/postgres/postgres/blob/4baaf863eca5412e07a8441b3b7e7482b7a8b21a/src/backend/utils/mb/conv.c#L479 ) Else I could also extend the map file. It would double in size if it only needs to include valid code points.
В списке pgsql-bugs по дате отправления: