Re: Patch for collation using ICU
От | Bruce Momjian |
---|---|
Тема | Re: Patch for collation using ICU |
Дата | |
Msg-id | 200505071414.j47EEfZ02040@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: Patch for collation using ICU (Palle Girgensohn <girgen@pingpong.net>) |
Список | pgsql-hackers |
Palle Girgensohn wrote: > >> This is because in the standard postgres implementation, upper/lower is > >> done one character at the time. A proper upper/lower cannot do it that > >> way. Other known example is in Turkish, where an ? (?) should look > >> different whether it is an initial letter or not. This fails in > >> standard postgresql for all platforms. > > > > Uh, where do you see that? Our code has: > > > > workspace = texttowcs(string); > > > > for (i = 0; workspace[i] != 0; i++) > > workspace[i] = towupper(workspace[i]); > > as you see, the loop runs towupper for one character at the time. I cannot > consider whether the letter is the initial, as required in Turkish, and it > cannot really convert one character into two ('?' -> 'SS') Oh, OK. I thought texttowcs() would expand the string to allow such conversions. > >> > We have depricated UNICODE in 8.1 in favor of UTF8 (no dash). Does > >> > that help? > >> > >> I'm aware of that. It might help for unicode, but there are a bunch of > >> other encodings. IANA has decided that utf-8 has *no* aliases, hence > >> only utf-8 (with dash, but case insensitve) is accepted. Perhaps ICU is > >> fogiving, I don't remember/know, but I think we need the mappings, > >> unfortunately. > > > > OK. I guess I am just confused why the native implementations are OK. > > They're OK since they understand that UNICODE (or UTF8) is really utf-8. > Problem is the strings used to describe them are not understood by ICU. > > BTW, the pg_enc2iananame_tbl is only used *from* internal representation > *to* IANA, not the other way around. Maybe that fact lowers the rate of > confusion? ;-) OK, got it. I am still a little confused why every native implementation understands our existing names but ICU does not. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
В списке pgsql-hackers по дате отправления: