Re: Careful PL/Perl Release Not Required
От | David E. Wheeler |
---|---|
Тема | Re: Careful PL/Perl Release Not Required |
Дата | |
Msg-id | 0DA44369-C0F1-4C9D-A158-48688D37A6CC@kineticode.com обсуждение исходный текст |
Ответ на | Re: Careful PL/Perl Release Not Required (Alex Hunsaker <badalex@gmail.com>) |
Список | pgsql-hackers |
On Feb 11, 2011, at 9:44 AM, Alex Hunsaker wrote: > It is decoded... the input string "%C3%A9" actually is the _same_ > string utf-8, latin1 and SQL_ASCII decoded or not. Those are all ascii > characters. Calling utf8::decode("%C3%A9") is essentially a noop. No, it's not decoded. It doesn't matter because they're ASCII bytes. But if the utf8 flag isn't set, it's not decoded. It'sjust byte soup as far as Perl is concerned. Unless I grossly misunderstand something, which is entirely possible. > Ok, I think i figured out why we seem to be talking past each other, we have: > CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ > use strict; > use URI::Escape; > utf8::decode($_[0]); > return uri_unescape($_[0]); $$ LANGUAGE plperlu; > > That *looks* like it is decoding the input string, which it is, but > actually that will double utf8 encode your string. It does not seem to > in this case because we are dealing with all ascii input. The trick > here is its also telling perl to decode/treat the *output* string as > utf8. > > uri_unescape() returns the same string you passed in, which thanks to > the utf8::decode() above has the utf8 flag set. Meaning we end up > treating it as 1 character instead of two. Or basically that it has > the same effect as calling utf8::decode() on the return value. > > The correct way to write that function pre 9.1 and post 9.1 would be > (in a utf8 database): > CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ > use strict; > use URI::Escape; > my $str = uri_unescape($_[0]); > utf8::decode($str); > return $str; > $$ LANGUAGE plperlu; > > The last utf8::decode being optional (as we said, it might not be > utf8), but granting the sought behavior by the op. No. If the argument to PL/Perl has the utf8 flag set, then that's what you always get. The utf8::decode() isn't necessarybecause it's already decoded: > perl -MURI::Escape -MEncode -E 'say utf8::is_utf8(uri_unescape(Encode::decode_utf8("“hi”")))' 1 Best, David
В списке pgsql-hackers по дате отправления: