Обсуждение: BUG #4257: about unicode extend

Поиск
Список
Период
Сортировка

BUG #4257: about unicode extend

От
"arli weng"
Дата:
The following bug has been logged online:

Bug reference:      4257
Logged by:          arli weng
Email address:      program@163.com
PostgreSQL version: 8.3
Operating system:   gentoo linux
Description:        about unicode extend
Details:

the command (chinese by utf-8):
INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');

in sqlite text type, no problem..
in postgres report error:

invalid byte sequence for encoding "UNICODE": 0xf0

the 𪕨 char is unicode extend b,
by utf-8 format, the hex code is "f0 aa 95 a8", because unicode extend b,
must start by 0xf0

but postgres cannot support it?

server/database/client encoding has unicode already.

help me pls, because i love postgres..
and sorry my english

Re: BUG #4257: about unicode extend

От
Tom Lane
Дата:
"arli weng" <program@163.com> writes:
> the command (chinese by utf-8):
> INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');
> in postgres report error:
> invalid byte sequence for encoding "UNICODE": 0xf0

I don't believe this is actually an 8.3 server.  In 8.1 or later that
encoding would be referred to as "UTF8"; also, 8.1 and later would show
all bytes of the complained-of character not just the first one.

8.0 and before only support 16-bit Unicode code points (ie, 3-byte
utf8 sequences).  We have support for 4-byte sequences in 8.1 and
later.  Also, there were some fixes in this area in Jan 2007, so
whichever branch you use, make sure you get a minor release that's
newer than that.

            regards, tom lane

Re: BUG #4257: about unicode extend

От
Michael Fuhr
Дата:
On Sat, Jun 21, 2008 at 01:25:15PM +0000, arli weng wrote:
> PostgreSQL version: 8.3

What does "SELECT version()" return?  I'm wondering if the server
isn't 8.3 but rather an earlier version (see below).

> the command (chinese by utf-8):
> INSERT INTO "title" VALUES(46307243,46307898,'酋鼠𪕨');
>
> in sqlite text type, no problem..
> in postgres report error:
>
> invalid byte sequence for encoding "UNICODE": 0xf0

Your INSERT statement works for me in 8.3.3, 8.2.9, and 8.1.13.
According to the release notes version 8.1 changed UNICODE to UTF8
and added support for 4-byte characters, so the fact that the error
says "UNICODE" and your database doesn't appear to support 4-byte
characters makes me wonder if you're running 8.0 or earlier.

--
Michael Fuhr

Re: BUG #4257: about unicode extend

От
ArLi
Дата:
very sorry, is i wrong..

the version is 8.0.15.

i just copyed from wrong of server-terminal window.. -_-!

thank you for help.

arli

Michael Fuhr wrote:
> On Sat, Jun 21, 2008 at 01:25:15PM +0000, arli weng wrote:
>=20=20=20
>> PostgreSQL version: 8.3
>>=20=20=20=20=20
>
> What does "SELECT version()" return?  I'm wondering if the server
> isn't 8.3 but rather an earlier version (see below).
>
>=20=20=20
>> the command (chinese by utf-8):
>> INSERT INTO "title" VALUES(46307243,46307898,'=E9=85=8B=E9=BC=A0=F0=AA=
=95=A8');
>>
>> in sqlite text type, no problem..
>> in postgres report error:
>>
>> invalid byte sequence for encoding "UNICODE": 0xf0
>>=20=20=20=20=20
>
> Your INSERT statement works for me in 8.3.3, 8.2.9, and 8.1.13.
> According to the release notes version 8.1 changed UNICODE to UTF8
> and added support for 4-byte characters, so the fact that the error
> says "UNICODE" and your database doesn't appear to support 4-byte
> characters makes me wonder if you're running 8.0 or earlier.
>
>=20=20=20