Re: Patch for collation using ICU
От | John Hansen |
---|---|
Тема | Re: Patch for collation using ICU |
Дата | |
Msg-id | 5066E5A966339E42AA04BA10BA706AE50A9305@rodrick.geeknet.com.au обсуждение исходный текст |
Ответ на | Patch for collation using ICU (Palle Girgensohn <girgen@pingpong.net>) |
Ответы |
Re: Patch for collation using ICU
Re: Patch for collation using ICU |
Список | pgsql-hackers |
Bruce Momjian wrote: > > There are two reasons for that optimization --- first, some > locale support is broken and Unicode encoding with a C locale > crashes (not an issue for ICU), and second, it is an > optimization for languages like Japanese that want to use > unicode, but don't need a locale because upper/lower means > nothing in those character sets. No, upper/lower means nothing in those languages, so why would you need to optimize upper/lower if they're not used?? And if they are, it's obviously because the text contains characters from other languages (probably english) and as such they should behave correctly. Did I mention that for japanese and the like, ICU would also offer transliteration... > > So, the first issue doesn't apply for ICU, and the second > might not depending on what characters you are using in the > Unicode character set. > > I guess I am little confused how ICU can do upper() when the > locale is C. What is it using to determine A is upper for a? > Am I confused? Simple, UNICODE basically consist of a table of characters (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt) Excerpt: 0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061; ... 0061;LATIN SMALL LETTER A;Ll;0;L;;;;;N;;;0041;;0041 From this you can see, that for 0041, which is capital letter A, there is a mapping to it's lowercase counterpart, 0061 Likewise, there is a mapping for 0061 which says it's uppercase counterpart is 0041. There is also SpecialCasing.txt which covers those mappings that haven't got a 1-1 mapping, such as the german SS. These mappings are fixed, independent of locale, only a few cases from specialcasing.txt depend on locale/context.
В списке pgsql-hackers по дате отправления: