Re: Careful PL/Perl Release Not Required
От | Alex Hunsaker |
---|---|
Тема | Re: Careful PL/Perl Release Not Required |
Дата | |
Msg-id | AANLkTi=+EpO9XBwhP++WuBgTvQ4jE4ywSM=p5xvE1QH1@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Careful PL/Perl Release Not Required ("David E. Wheeler" <david@kineticode.com>) |
Ответы |
Re: Careful PL/Perl Release Not Required
|
Список | pgsql-hackers |
On Fri, Feb 11, 2011 at 11:07, David E. Wheeler <david@kineticode.com> wrote: > I don't understand where the bug is. If a string is encoded in utf-8 Perl will not treat it as such unless the utf-8 flagis set. Ok so I think we agreed this is right: $ perl -E 'use URI::Escape; my $str = uri_unescape("%C3%A9"); say sprintf("chr: %s hex: %s, len: %s", $str, unpack("H*", $str), length $str)' chr: é hex: c3a9, len: 2 Key part here is len = 2, or 2 characters. Lets try that in a postgres 9.0 utf8 database: => create or replace function uri_decode(txt text, in_decode int, out_decode int) returns text as $$ use URI::Escape; my $str = shift; utf8::decode($str) if(shift); $str = uri_unescape($str); utf8::decode($str) if(shift); return $str; $$ language plperlu; -- For ease we are just going to look at the length as most terminals will have utf8 and latin1 mapped. => SELECT length(uri_decode('%c3%a9', 0, 0));length -------- 2 (1 row) Looks right. What happens if we decode after uri_unescape, we should get 1 character no? -- decode after uri_unescape => SELECT length(uri_decode('%c3%a9', 0, 1));length -------- 1 Ok thats right. What happens if we decode before? Nothing should right? After all '%c3%a9' is all asci. We should still get 2 characters. => SELECT length(uri_decode('%c3%a9', 1, 0));length -------- 1 Whoa! 1? Does vanilla perl do that?: perl <<'perl' use URI::Escape; my $str = '%c3%a9'; utf8::decode($str); $str = uri_unescape($str); print sprintf("chr: %s hex: %s, len: %s\n", $str, unpack("H*", $str), length $str); perl chr: é hex: c3a9, len: 2 Nope, so postgres gets it wrong here. Thats the problem. In 9.1 it does "the right thing": => SELECT length(uri_decode(0, 0));length -------- 2 Yay! 2! => SELECT length(uri_decode(1, 0)); CONTEXT: PL/Perl function "uri_decode"length -------- 2 Yay! also 2! => SELECT length(uri_decode(0, 1));length -------- 1 Yay! 1
В списке pgsql-hackers по дате отправления: