Обсуждение: BUG #13638: Exception texts from plperl has bad encoding

Поиск
Список
Период
Сортировка

BUG #13638: Exception texts from plperl has bad encoding

От
lei@aswsyst.cz
Дата:
The following bug has been logged on the website:

Bug reference:      13638
Logged by:          Michal Leinweber
Email address:      lei@aswsyst.cz
PostgreSQL version: 9.4.4
Operating system:   Debian 8
Description:

I have UTF-8 database and using UTF-8 text in plpgsql and plperl functions.
Everything works ok, only exceptions from plperl functions have bad encoding
(maybe double encoded).

Compare output of these 2 functions:

create or replace function perl_test() returns text
language plperl as $$
  return "Český text ěščřžýáíé";
$$;

create or replace function perl_test_err() returns text
language plperl as $$
  elog(ERROR, "Česká chyba ěščřžýáíé");
$$;

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Tom Lane
Дата:
lei@aswsyst.cz writes:
> I have UTF-8 database and using UTF-8 text in plpgsql and plperl functions.
> Everything works ok, only exceptions from plperl functions have bad encoding
> (maybe double encoded).

> Compare output of these 2 functions:

> create or replace function perl_test() returns text
> language plperl as $$
>   return "Český text ěščřžýáíé";
> $$;

> create or replace function perl_test_err() returns text
> language plperl as $$
>   elog(ERROR, "Česká chyba ěščřžýáíé");
> $$;

I traced through this example to the extent of finding that:

* The string passed to elog() in do_util_elog() seems correctly
encoded.

* So does the string passed to croak() after elog does its longjmp,
which is unsurprising since elog.c doesn't really do anything to it.

* Back in plperl_call_perl_func, we use sv2cstr(ERRSV) to get the
error string.  sv2cstr calls "SvPVutf8(sv, len)", and that is what
is giving back the bogusly-encoded data.

I suspect the root problem is that instead of baldly doing

        croak("%s", edata->message);

in do_util_elog(), we need to do something to inform Perl what encoding
the message string is in.  This is beyond my Perl-fu, however.

            regards, tom lane

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Tom Lane
Дата:
I wrote:
> I suspect the root problem is that instead of baldly doing
>         croak("%s", edata->message);
> in do_util_elog(), we need to do something to inform Perl what encoding
> the message string is in.  This is beyond my Perl-fu, however.

BTW, if the answer to that involves turning the message back into an SV,
I think we should not do that but just use the SV that was passed in
originally.  Then the TRY/CATCH could go away entirely, ie the code
would become roughly like

    if (level < ERROR)
    {
        cmsg = sv2cstr(msg);
        elog(level, "%s", cmsg);
        pfree(cmsg);
    }
    else
    {
        croak-with-SV(msg);
    }

which knows slightly more about the meaning of "level" than before,
but otherwise seems much less entangled with anything.

It's still beyond my Perl-fu ...

            regards, tom lane

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> It's still beyond my Perl-fu ...

Maybe Alex or Tim can shed some light?  Please see:
http://www.postgresql.org/message-id/flat/20150925112345.26913.28407@wrigleys.postgresql.org

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Tom Lane
Дата:
Michal Leinweber <lei@aswsyst.cz> writes:
> The problem is not only with elog call, but it also fires if plpgsql
> code called from plperl function raises exception.

Hmm, yeah, there are a whole bunch of instances of that croak("%s",
edata->message) pattern, and most of them don't have an ancestral SV
to work from.  So we'll need a more general solution anyway, and it
may not be worth changing do_util_elog() to work in a fundamentally
different way than the rest.

            regards, tom lane

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Michal Leinweber
Дата:
The problem is not only with elog call, but it also fires if plpgsql
code called from plperl function raises exception.

Best Regards,
     Michal Leinweber

On 2015-09-25 21:33, Tom Lane wrote:
> I wrote:
>> I suspect the root problem is that instead of baldly doing
>>         croak("%s", edata->message);
>> in do_util_elog(), we need to do something to inform Perl what
>> encoding
>> the message string is in.  This is beyond my Perl-fu, however.
>
> BTW, if the answer to that involves turning the message back into an
> SV,
> I think we should not do that but just use the SV that was passed in
> originally.  Then the TRY/CATCH could go away entirely, ie the code
> would become roughly like
>
>     if (level < ERROR)
>     {
>         cmsg = sv2cstr(msg);
>         elog(level, "%s", cmsg);
>         pfree(cmsg);
>     }
>     else
>     {
>         croak-with-SV(msg);
>     }
>
> which knows slightly more about the meaning of "level" than before,
> but otherwise seems much less entangled with anything.
>
> It's still beyond my Perl-fu ...
>
>             regards, tom lane

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Tim Bunce
Дата:
On Fri, Sep 25, 2015 at 04:51:20PM -0300, Alvaro Herrera wrote:
> Tom Lane wrote:
>
> > It's still beyond my Perl-fu ...
>
> Maybe Alex or Tim can shed some light?  Please see:
> http://www.postgresql.org/message-id/flat/20150925112345.26913.28407@wrigleys.postgresql.org

I think you want to call croak_sv(SV) with an SV containing the error
message and flagged as UTF-8.

Yim.

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Tom Lane
Дата:
Tim Bunce <Tim.Bunce@pobox.com> writes:
> I think you want to call croak_sv(SV) with an SV containing the error
> message and flagged as UTF-8.

I found croak_sv() in the headers for Perl 5.18 (present on OS X Yosemite)
but not in Perl 5.10 (present on RHEL6), much less 5.8 which is the oldest
Perl version we claim to support.  This doesn't look like a workable
answer unless we want to move the compatibility goalposts a long way.

            regards, tom lane

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Alex Hunsaker
Дата:


On Sun, Sep 27, 2015 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tim Bunce <Tim.Bunce@pobox.com> writes:
> I think you want to call croak_sv(SV) with an SV containing the error
> message and flagged as UTF-8.

I found croak_sv() in the headers for Perl 5.18 (present on OS X Yosemite)
but not in Perl 5.10 (present on RHEL6), much less 5.8 which is the oldest
Perl version we claim to support.  This doesn't look like a workable
answer unless we want to move the compatibility goalposts a long way.

Yeah, I was just looking at that. I couldn't find a good way to support 5.8. Doing:

const char *utf8_str = utf_e2u(str);
SV *sv = get_sv("@", TRUE);
sv_setpvf(sv, "%s", utf8_str);
SvUTF8_on(sv); 
croak(NULL);

Works, but screws up the error line numbers. For example, given the below test in plperl.out:

--
-- Test with a bad type
--
CREATE OR REPLACE FUNCTION perl_spi_prepared_bad(double precision) RETURNS double precision AS $$
  my $x = spi_prepare('SELECT 10.0 * $1 AS a', 'does_not_exist');
  my $q = spi_query_prepared($x,$_[0]);
  my $result;
  while (defined (my $y = spi_fetchrow($q))) {
      $result = $y->{a};
  }
  spi_freeplan($x);
  return $result;
$$ LANGUAGE plperl;

Instead of:
ERROR:  type "does_not_exist" does not exist at line 2.

We get:
ERROR:  type "does_not_exist" does not exist at -e line 56.

I poked at this for a bit and didn't see a way around that.

Attached is a draft patch that at least fixes the issue for 5.10 and up. I guess we don't want to commit the regression test unless we want to add 2 new plperl_elog.out files (one for 5.8 and one for the already existing alt file).

Another thing to note is sv2cstr() can elog(ERROR) if the message has an invalid byte sequence:

create or replace function perl_test_err() returns text
language plperl as $$
  elog(ERROR, "\0");
$$;

-- or
create or replace function perl_test_err() returns text
language plperl as $$
  elog(INFO, "\0");
$$;
select perl_test_err();
ERROR:  invalid byte sequence for encoding "UTF8": 0x00k

Without the PG_TRY block we can't punt these errors back to perl. Maybe that does not matter all that much. After-all we could still have an error converting that new error to utf8 for perl (realistically this probably falls into the impossible category...). Thoughts?

Attached is a patch that does /not/ take into account errors thrown by pg_any_to_server() along the lines of Tom's above suggestion. I added a croak_cstr() function (trying to be in the spirit of the other plperl_helper functions) and converted various croak("%s", e->message); over to it. Tested on 5.8.9 and 5.18.2 from 9.1-stable to 9.4 (9.5 doesn't seem to pass a standard make check on my machine atm, but I don't think there have been any plperl changes).

Note it does not apply cleaning to 9.1-9.3 (trivial rejects), I'm happy to do the leg work if we like this proposed patch.

Вложения

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> Yeah, I was just looking at that. I couldn't find a good way to support
> 5.8. Doing: ...
> Works, but screws up the error line numbers.

I looked into this and determined that the problem is that the location
info doesn't get appended to ERRSV until after Perl has popped the stack.

A possibly-entirely-wrong hack that seems to fix that is to use mess()
to construct the SV, *and* append the location info, before we croak.
I tried this implementation of croak_cstr:

    char       *utf8_str = utf_e2u(str);
    SV *ssv = mess("%s", utf8_str);
    SV *errsv = get_sv("@", GV_ADD);

    SvUTF8_on(ssv);
    pfree(utf8_str);

    sv_setsv(errsv, ssv);
    croak(NULL);

with Perl 5.8.9 and it passed your proposed regression test.  Obvious
questions include:

* I don't see mess() in the perlapi man page, so is it okay to use?

* Is there any leakage involved in this?  (I don't know what prompted
you to add sv_2mortal() to the other implementation, but maybe we need
it here too?)

* Is there some other reason why this is wrong or a bad idea?  Since we'd
only use this with versions lacking croak_sv(), future-proofing doesn't
seem like a good argument against it, but maybe there is another one.


BTW, I think we can't actually use this regression test case, because it
won't work as-is in non-UTF8 encodings.  But the use of UTF8 characters in
the string doesn't seem like an essential property of the test.  However,
other test cases may already provide sufficient coverage if we're not
going to test encoding conversion.

Also, I'm inclined to leave do_util_elog() alone other than replacing
croak("%s", edata->message) with croak_cstr(edata->message).  Since
we have to have croak_cstr() anyway, there seems little value in taking
any risk that we've missed some reason why the existing coding there
is important.

            regards, tom lane

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Alex Hunsaker
Дата:


On Mon, Sep 28, 2015 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alex Hunsaker <badalex@gmail.com> writes:
> Yeah, I was just looking at that. I couldn't find a good way to support
> 5.8. Doing: ...
> Works, but screws up the error line numbers.

I looked into this and determined that the problem is that the location
info doesn't get appended to ERRSV until after Perl has popped the stack.

 
I tried this implementation of croak_cstr: ...

Yeah that seems to work great!

 
* I don't see mess() in the perlapi man page, so is it okay to use?


Looks like they added it to perlapi in 5.12, given that it exists and seems to work in 5.8 and 5.10 I think we are ok to use it. I tested your above usage on 5.8, 5.10, 5.12.


* Is there any leakage involved in this?  (I don't know what prompted
you to add sv_2mortal() to the other implementation, but maybe we need
it here too?)

No, $@ is global and mess() also uses a global. (unless perl is destructing). 

In the croak_sv() case, we never use/assign what cstr2sv() returns anywhere in perl land, so it always has a refcount of 1. We have similar usage of sv_2mortal(cstr2sv()) albeit not on the same line in plperl.c.


* Is there some other reason why this is wrong or a bad idea?  Since we'd
only use this with versions lacking croak_sv(), future-proofing doesn't
seem like a good argument against it, but maybe there is another one. 

The only argument that comes to mind is AFAIK perl 5.8 and 5.10 are more or less unmaintained so it might not be worth the effort to make them work. And it's been this way forever.
 

BTW, I think we can't actually use this regression test case, [...]  However,
other test cases may already provide sufficient coverage if we're not
going to test encoding conversion.

Agreed. I'll attach it separately just for easy verification that everything is working as intended. 

 
Also, I'm inclined to leave do_util_elog() alone other than replacing
croak("%s", edata->message) with croak_cstr(edata->message). 

Done that way.

The attached was tested on 5.8.9, 5.10.1, 5.12.5, 5.14.4, 5.18.2 and 5.22.0.

Вложения

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> The attached was tested on 5.8.9, 5.10.1, 5.12.5, 5.14.4, 5.18.2 and 5.22.0.

Thanks!  Barring objections, I'll push this in the morning.

            regards, tom lane

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> On Mon, Sep 28, 2015 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, I think we can't actually use this regression test case, [...]

> Agreed. I'll attach it separately just for easy verification that
> everything is working as intended.

After awhile I remembered that we already had a similar test case in
plpython: it's testing with no-break space (U+A0) which has an equivalent
in most encodings.  So I adopted that strategy, and pushed this with
a test case.  We'll soon see if the buildfarm likes it.

            regards, tom lane

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Alex Hunsaker
Дата:
On Tue, Sep 29, 2015 at 8:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Alex Hunsaker <badalex@gmail.com> writes:
> > On Mon, Sep 28, 2015 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> BTW, I think we can't actually use this regression test case, [...]
>
> After awhile I remembered that we already had a similar test case in
> plpython: it's testing with no-break space (U+A0) which has an equivalent
> in most encodings.  So I adopted that strategy, and pushed this with
> a test case.  We'll soon see if the buildfarm likes it.
>
>                         regards, tom lane
>

I've been keeping a loose eye on the buildfarm, everything looks good so
far.

FYI I also ended up testing 5.16.3 and 5.20.3 which means it should be
tested on every stable version of perl.

Thanks!

Re: BUG #13638: Exception texts from plperl has bad encoding

От
Michal Leinweber
Дата:
Hello,

just reporting that the attached patch solves my problem. Thanks.

Best Regards
     Michal Leinweber

On 2015-09-29 05:51, Alex Hunsaker wrote:

> On Mon, Sep 28, 2015 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Alex Hunsaker <badalex@gmail.com> writes:
>>> Yeah, I was just looking at that. I couldn't find a good way to
>>> support
>>> 5.8. Doing: ...
>>> Works, but screws up the error line numbers.
>>
>> I looked into this and determined that the problem is that the
>> location
>> info doesn't get appended to ERRSV until after Perl has popped the
>> stack.
>
>> I tried this implementation of croak_cstr: ...
>
> Yeah that seems to work great!
>
>> * I don't see mess() in the perlapi man page, so is it okay to use?
>
> Looks like they added it to perlapi in 5.12, given that it exists and
> seems to work in 5.8 and 5.10 I think we are ok to use it. I tested
> your above usage on 5.8, 5.10, 5.12.
>
>> * Is there any leakage involved in this?  (I don't know what prompted
>> you to add sv_2mortal() to the other implementation, but maybe we need
>> it here too?)
>
> No, $@ is global and mess() also uses a global. (unless perl is
> destructing).
>
> In the croak_sv() case, we never use/assign what cstr2sv() returns
> anywhere in perl land, so it always has a refcount of 1. We have
> similar usage of sv_2mortal(cstr2sv()) albeit not on the same line in
> plperl.c.
>
>> * Is there some other reason why this is wrong or a bad idea?  Since
>> we'd
>> only use this with versions lacking croak_sv(), future-proofing
>> doesn't
>> seem like a good argument against it, but maybe there is another one.
>
> The only argument that comes to mind is AFAIK perl 5.8 and 5.10 are
> more or less unmaintained so it might not be worth the effort to make
> them work. And it's been this way forever.
>
>> BTW, I think we can't actually use this regression test case, [...]
>> However,
>
>> other test cases may already provide sufficient coverage if we're not
>> going to test encoding conversion.
>
> Agreed. I'll attach it separately just for easy verification that
> everything is working as intended.
>
>> Also, I'm inclined to leave do_util_elog() alone other than replacing
>> croak("%s", edata->message) with croak_cstr(edata->message).
>
> Done that way.
>
> The attached was tested on 5.8.9, 5.10.1, 5.12.5, 5.14.4, 5.18.2 and
> 5.22.0.