Re: plperlu problem with utf8
От | Alex Hunsaker |
---|---|
Тема | Re: plperlu problem with utf8 |
Дата | |
Msg-id | AANLkTimKvnkFirhsAtQZ6aw35vw8HfzycZ0pQeiN9_n_@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: plperlu problem with utf8 (David Christensen <david@endpoint.com>) |
Ответы |
Re: plperlu problem with utf8
(Alex Hunsaker <badalex@gmail.com>)
|
Список | pgsql-hackers |
On Sun, Dec 19, 2010 at 21:00, David Christensen <david@endpoint.com> wrote: > > On Dec 19, 2010, at 2:20 AM, Alex Hunsaker wrote: > >> With the attached we: >> - for function arguments, convert (using pg_do_encoding_conversion) to >> utf8 from the current database encoding. > > How does this deal with input records of composite type? Same way it worked before just encoded properly. :) >> - fix errors so they properly encode their message to the current >> database encoding > >[...] does it degrade to the current behavior, as opposed to fail or eat the error message without outputting? No it fails with an "invalid encoding message". I think thats the best we can do. It certainly seems wrong to send invalid bytes back to the client. >> - always do the utf8 hack so utf8.pm is loaded (fixes utf8 regexs in >> plperl). [...] > > This sounds good in general, but I think should be skipped if GetDatabaseEncoding() == SQL_ASCII. Why? What if you want to use a module that does utf8 things? >> [ PLs handling strings as utf8 in SQL_ASCII databases ] > I think this could/should be adequately handled by not calling the function when the DatabaseEncoding is SQL_ASCII. [...]> Also, I'd argue that pltcl and plpython should be fixed as well for the same reasons. I don't disagree, but im not volunteering for it either. Without having done any in depth analysis, the largest problem is they use strings for most arguments. Strings need an encoding and pl/python and pl/tcl want a utf8 string. So we need to convert from SQL_ASCII to UTF8. Which like you say is nonsensical. So really in the SQL_ASCII case we would need to switch to binary datatypes for those languages. Perl is a bit more flexible so it should be easier to 'fix'. >> This does *not* address the bytea issue where currently if you have >> bytea input or output we try to encode that the same as any string. I >> think thats going to be a bit more invasive and this patch should >> stands on its own. >> <plperl_fix_enc.patch.gz> > > Yeah, I'm not sure how invasive that will end up being, or if there are other datatypes which should skip the text processing. Well, I don't think it will be too bad. As for examples of other datatypes that we could skip text processing: int, float and numeric. Right now we treat them as strings, which works fine for perl... but its a tad inefficient. Ideally we could be using things like SvIV. > I noticed you moved the declaration of perl_sys_init_done; was that an independent bug fix, or did something in the patchrequire that? It just squashes an unused variable "perl_sys_init_done" warning when perl is compiled with its own malloc. Of course that added another warning "warning: ISO C90 forbids mixed declarations and code" :P In further review over caffeine this morning I noticed there are a few places I missed: plperl_build_tuple_result(), plperl_modify_tuple() and Util.XS. The first 2 pass perl hash keys for the column straight to SPI_fnumber() without doing any conversion. In theory this means we might not be able to look up columns with non ascii characters. I missed this because those hash keys are declared as char* instead of SV* and do not use any of the standard perl string macros. Anyway I hope to have a fresh patch tomorrow.
В списке pgsql-hackers по дате отправления:
Следующее
От: Fujii MasaoДата:
Сообщение: Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep