Re: Unicode + LC_COLLATE
От | John Sidney-Woollett |
---|---|
Тема | Re: Unicode + LC_COLLATE |
Дата | |
Msg-id | 3487.192.168.0.64.1082640418.squirrel@mercury.wardbrook.com обсуждение исходный текст |
Ответ на | Re: Unicode + LC_COLLATE (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Unicode + LC_COLLATE
|
Список | pgsql-general |
Tom Lane said: > C locale basically means "sort by the byte sequence values". It'll do > something self-consistent, but maybe not what you'd like for UTF8 > characters. OK, that explains that. I guess I will need to try it out to see what the effect is on extended character sets. >> Our database is UNICODE with LC_COLLATE=en_US.iso885915. > Does that sort rationally at all? I should think you'd need to specify > an LC_COLLATE setting that's designed for UTF8 encoding, not 8859-15. Er..., actually the LC_COLLATE for the DB in question is C - I was looking at the wrong database (wrong telnet session)! So your comments above apply in this case. > If you only ever store characters that are in 7-bit ASCII then none of > this will affect you, and you can get away with broken combinations of > encoding and locale. But if you'd like to sort characters outside the > minimal ASCII set then you need to get it right ... Tom, thanks for the answers above. I guess if I have some time I should build some different DBs with different combinations of encoding and collations and summarise my findings using different types of data and sort/search commands, in case anyone else has the same level of confusion that I do... John Sidney-Woollett
В списке pgsql-general по дате отправления: