Re: A thought about regex versus multibyte character sets
От | Tom Lane |
---|---|
Тема | Re: A thought about regex versus multibyte character sets |
Дата | |
Msg-id | 21003.1259704346@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: A thought about regex versus multibyte character sets (Alvaro Herrera <alvherre@commandprompt.com>) |
Список | pgsql-hackers |
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> I just spent a bit of time considering what we might do to fix this. >> The idea mentioned in the above thread was to switch over to using >> wchar_t in the regex code, but that seems to have a number of problems. >> One showstopper is that on some platforms wchar_t is only 16 bits and >> can't represent the full range of Unicode characters. I don't want to >> fix case-folding only to break regexes for other uses. > We have a TODO item about having a regex specific data type. Would > implementing that solve this problem? No, not particularly --- the stumbling block here is really impedance mismatch between our internal APIs and libc's standard locale support. The TODO item that would fix it is implementing our own locale support; but I ain't holding my breath for that one. AFAIR the motivation for a regex data type was solely performance. regards, tom lane
В списке pgsql-hackers по дате отправления: