Обсуждение: BUG #3932: utf-8 and upper()/lower(): PANIC: ERRORDATA_STACK_SIZE exceeded

Поиск
Список
Период
Сортировка

BUG #3932: utf-8 and upper()/lower(): PANIC: ERRORDATA_STACK_SIZE exceeded

От
"Florian Wunderlich"
Дата:
The following bug has been logged online:

Bug reference:      3932
Logged by:          Florian Wunderlich
Email address:      fwunderlich@factor3.de
PostgreSQL version: 8.2.6
Operating system:   Debian unstable
Description:        utf-8 and upper()/lower(): PANIC: ERRORDATA_STACK_SIZE
exceeded
Details:

- input file in encoding iso-8859-1:

set client_encoding='iso-8859-1';
select upper('ä'), lower('Ä');

(note: the argument to upper is a lower case a umlaut, while the argument to
lower is an upper case a umlaut)

- database "iso" with encoding iso-8859-1,
  database "utf" with encoding utf-8,
  both in a cluster with locale=de_DE


The command

  psql iso < input

yields the correct output (upper case a umlaut, lower case a umlaut).


The command

  psql utf < input

yields

PANIK: ERRORDATA_STACK_SIZE exceeded.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost


The log shows:

ERROR:  invalid byte sequence for encoding "UTF8": 0xe384
HINT:  This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".

then the same error four times but with 0xfc.


Doing the exact same thing with an input file with encoding utf-8 (with
client_encoding replaced accordingly) again works fine with the iso
database, but yields a lower case a umlaut for upper() and nothing for the
lower() function for the utf database.


Thus, it would seem that the upper() and lower() functions do not work at
all for databases with encoding utf-8 and non-US-ASCII input.

Re: BUG #3932: utf-8 and upper()/lower(): PANIC: ERRORDATA_STACK_SIZE exceeded

От
Alvaro Herrera
Дата:
Florian Wunderlich wrote:

> - input file in encoding iso-8859-1:
>
> set client_encoding='iso-8859-1';
> select upper('ä'), lower('Ä');
>
> (note: the argument to upper is a lower case a umlaut, while the argument to
> lower is an upper case a umlaut)
>
> - database "iso" with encoding iso-8859-1,
>   database "utf" with encoding utf-8,
>   both in a cluster with locale=de_DE

I think this is just a case of a misconfigured server.  If you choose
locale de_DE, which supports only the iso-8859-1 encoding, it is an
error to build a database with utf8 encoding -- which is why 8.3 rejects
that combination.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #3932: utf-8 and upper()/lower(): PANIC: ERRORDATA_STACK_SIZE exceeded

От
Florian Wunderlich
Дата:
Alvaro Herrera wrote:
> Florian Wunderlich wrote:
>
>> - input file in encoding iso-8859-1:
>>
>> set client_encoding='iso-8859-1';
>> select upper('ä'), lower('Ä');
>>
>> (note: the argument to upper is a lower case a umlaut, while the argument to
>> lower is an upper case a umlaut)
>>
>> - database "iso" with encoding iso-8859-1,
>>   database "utf" with encoding utf-8,
>>   both in a cluster with locale=de_DE
>
> I think this is just a case of a misconfigured server.  If you choose
> locale de_DE, which supports only the iso-8859-1 encoding, it is an
> error to build a database with utf8 encoding -- which is why 8.3 rejects
> that combination.
>

You are correct; if I use de_DE.UTF-8 for initdb, the database with
encoding utf-8 works fine (and the database with iso-8859-1 doesn't).

Because such an invalid combination cannot happen for 8.3 anymore, the
PANIC cannot occur anymore, and thus the bug can be closed.