pl/perl and utf-8 in sql_ascii databases
От | Christoph Berg |
---|---|
Тема | pl/perl and utf-8 in sql_ascii databases |
Дата | |
Msg-id | 20120209102116.GA14429@msgid.df7cb.de обсуждение исходный текст |
Ответы |
Re: pl/perl and utf-8 in sql_ascii databases
|
Список | pgsql-hackers |
Hi, we have a database that is storing strings in various encodings (and non-encodings, namely the arbitrary byte soup that you might see in email headers from the internet). For this reason, the database uses sql_ascii encoding. The columns are text, as most characters are ascii, so bytea didn't seem the right way to go. Currently we are on 8.3 and try to upgrade to 9.1, but the plperlu functions we have are acting up. Old behavior on 8.3 .. 9.0: sql_ascii =# create or replace function whitespace(text) returns text language plperlu as $$ $a = shift; $a =~ s/[\t ]+/ /g; return $a; $$; CREATE FUNCTION sql_ascii =# select whitespace (E'\200'); -- 0x80 is not valid utf-8whitespace ------------ sql_ascii =# select whitespace (E'\200')::bytea;whitespace ------------\x80 New behavior on 9.1.2: sql_ascii =# select whitespace (E'\200'); ERROR: XX000: Malformed UTF-8 character (fatal) at line 1. KONTEXT: PL/Perl function "whitespace" ORT: plperl_call_perl_func, plperl.c:2037 A crude workaround is: sql_ascii =# create or replace function whitespace_utf8_off(text) returns text language plperlu as $$ use Encode; $a = shift; Encode::_utf8_off($a); $a =~ s/[\t ]+/ /g; return $a; $$; CREATE FUNCTION sql_ascii =# select whitespace_utf8_off (E'\200');whitespace_utf8_off ---------------------\u0080 sql_ascii =# select whitespace_utf8_off (E'\200')::bytea;whitespace_utf8_off ---------------------\xc280 (Note that the workaround is not perfect as the resulting 0x80..0xff bytes are still tagged to be utf8.) I think the bug is in plperl_helpers.h: /** Create a new SV from a string assumed to be in the current database's* encoding.*/ static inline SV * cstr2sv(const char *str) { SV *sv; char *utf8_str = utf_e2u(str); sv = newSVpv(utf8_str, 0); SvUTF8_on(sv); pfree(utf8_str); return sv; } In sql_ascii databases, utf_e2u does not do any recoding, but then SvUTF8_on still marks the string as utf-8, while it isn't. (Returned values might also need fixing.) In my view, this is clearly a bug in pl/perl on sql_ascii databases. Christoph -- cb@df7cb.de | http://www.df7cb.de/
В списке pgsql-hackers по дате отправления: