Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters

Поиск
Список
Период
Сортировка
От Frans
Тема Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters
Дата
Msg-id 49DA2984.3070904@geodan.nl
обсуждение исходный текст
Ответ на Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Tom Lane wrote:
> Frans <frans@geodan.nl> writes:
>
>> We have just discovered a problem with the soundex function in
>> PostgreSQL 8.3.7. The problem is easy to reproduce. The following query
>> returns the ASCII code of the soundex representation of the Greek letter Pi:
>>
>
>
>> select ascii (soundex('Î '));
>>
>
>
>> In PostgreSQL 8.2.6 the result would be 0 (character null). In
>> PostgreSQL 8.3.7 the return value is 944, which is the UTF-16 code of
>> this letter.
>>
>
> Hm, I take it you are working in database encoding utf8?
That is correct. I should have mentioned it. It is the default encoding
we use because we often deal with non-English languages. And it is
because of multilingualism that the fuzzystrmatch functions are handy.
>  The
> fuzzystrmatch module doesn't really work with utf8 (nor any other
> multibyte encoding), because it depends on the <ctype.h> functions.
> What you'll probably get when applying it to non-ascii utf8 is
> an invalidly encoded string.
>
Well, in 8.2.6 the result for non-ASCII UTF-8 was an empty string (ASCII
code 0). You could argue that this is a valid way of expressing that the
input string could not be processed (especially if it were documented).
The nice thing about this approach is that the result is valid ASCII
(and UTF-8).
> This is a known limitation that probably should be better documented.
> It was just as broken in 8.2 (and every previous version), though.
>
But it seems there has been a recent change in the handling of non-ASCII
strings. And the result of this change is that further handling or
storing of the function output has become more difficult.
>             regards, tom lane
>

Best regards,
Frans

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters