Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?
От | Tatsuo Ishii |
---|---|
Тема | Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered? |
Дата | |
Msg-id | 20201006.121142.2002518154310370203.t-ishii@sraoss.co.jp обсуждение исходный текст |
Ответ на | Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered? (Tatsuo Ishii <ishii@sraoss.co.jp>) |
Список | pgsql-general |
> But as he already admitted, actually GB18030 is 4 byte encoding, rather > than 2 bytes. So maybe we could find a way to map original GB18030 to > ASCII-safe GB18030 using 4 bytes. Here is an idea (in-byte represents GB18030, out-byte represents internal server encoding): if (in-byte1 is 0x00-80) /* ASCII */ out-byte1 = in-byte1 else if (in-byte1 is 0x81-0xfe && in-byte2 is 0x40-0x7f) /* 2 bytes GB18030 */ out-byte1 = in-byte1 out-byte2 = 0x80 out-byte3 = in-byte2 + 0x80 (should be 0xc0-0xc9) out-byte4 = 0x80 else if (in-byte1 is 0x81-0xfe && in-byte2 is 0x80-0xfe) /* 2 bytes GB18030 */ out-byte1 = in-byte1 out-byte2 = 0x80 out-byte3 = 0x80 out-byte4 = in-byte2 (should be 0x80-0xfe) else if (in-byte1 is 0x81-0xfe && in-byte2 is 0x30-0x39) /* 4 bytes GB18030 */ out-byte1 = in-byte1 out-byte2 = in-byte2 + 0x80 (should be 0xb0-0xb9) out-byte3 = in-byte3 out-byte4 = in-byte4 + 0x80 (should be 0xb0-0xb9) Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
В списке pgsql-general по дате отправления: