Re: UPPER()/LOWER() and UTF-8
От | Alexey Mahotkin |
---|---|
Тема | Re: UPPER()/LOWER() and UTF-8 |
Дата | |
Msg-id | 873cd2g8ae.fsf@dim.w-m.ru обсуждение исходный текст |
Ответ на | Re: UPPER()/LOWER() and UTF-8 (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: UPPER()/LOWER() and UTF-8
|
Список | pgsql-hackers |
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: TL> writes: upper/lower aren't TL> going to work desirably in any multi-byte character set TL> encoding. >> Can you please point me at their implementation? I do not >> understand why that's impossible. TL> Because they use <ctype.h>'s toupper() and tolower() TL> functions, which only work on single-byte characters. Aha, that's in src/backend/utils/adt/formatting.c, right? Yes, I see, it goes byte by byte and uses toupper(). I believe we could look at the locale, and if it is UTF-8, then use (or copy) e.g. g_utf8_strup/strdown, right? http://developer.gnome.org/doc/API/2.0/glib/glib-Unicode-Manipulation.html#g-utf8-strup I belive that patch could be written in a matter of hours. TL> There has been some discussion of using <wctype.h> where TL> available, but this has a number of issues, notablyfiguring TL> out the correct mapping from the server string encoding (eg TL> UTF-8) to unpacked wide characters. At minimum we'd need to TL> know which charset the locale setting is expecting, and there TL> doesn't seemto be a portable way to find that out. TL> IIRC, Peter thinks we must abandon use of libc's locale TL> functionality altogether and write our own locale layerbefore TL> we can really have all the locale-specific functionality we TL> want. I believe that native Unicode strings (together with human language handling) should be introduced as (almost) separate data type (which have nothing to do with locale), but that's bluesky maybe. --alexm
В списке pgsql-hackers по дате отправления: