Re: A thought about regex versus multibyte character sets

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: A thought about regex versus multibyte character sets
Дата	1 декабря 2009 г. 17:52:49
Msg-id	21003.1259704346@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: A thought about regex versus multibyte character sets (Alvaro Herrera <alvherre@commandprompt.com>)
Список	pgsql-hackers

Дерево обсуждения

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> I just spent a bit of time considering what we might do to fix this.
>> The idea mentioned in the above thread was to switch over to using
>> wchar_t in the regex code, but that seems to have a number of problems.
>> One showstopper is that on some platforms wchar_t is only 16 bits and
>> can't represent the full range of Unicode characters.  I don't want to
>> fix case-folding only to break regexes for other uses.

> We have a TODO item about having a regex specific data type.  Would
> implementing that solve this problem?

No, not particularly --- the stumbling block here is really impedance
mismatch between our internal APIs and libc's standard locale support.
The TODO item that would fix it is implementing our own locale support;
but I ain't holding my breath for that one.

AFAIR the motivation for a regex data type was solely performance.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: A thought about regex versus multibyte character sets