Re: another seemingly simple encoding question
| От | John D. Burger |
|---|---|
| Тема | Re: another seemingly simple encoding question |
| Дата | |
| Msg-id | fff3e8bce85a8c49e8d81ea4b45e367e@mitre.org обсуждение исходный текст |
| Ответ на | Re: another seemingly simple encoding question (joseph <kmh496@kornet.net>) |
| Список | pgsql-general |
This doesn't sound like your problem, but I'll explain the normalization issue using Korean as an example, since that seems to be your data: There are codepoints in Unicode both for Hangul and Jamo, so a Hangul glyph can be represented either with the single corresponding codepoint, or as two or three Jamo codepoints. A Unicode font would display these two alternatives identically. In any Unicode encoding, including UTF8, these two strings would not be byte-for-byte identical. The Unicode normalization forms are four algorithms for normalizing the strings in such a way that they do compare identically. Anyway, it sounds like you have the opposite problem, two strings that are comparing equal when you think they shouldn't. I don't know that anyone can help you unless you post an actual example of two such strings. - John D. Burger MITRE
В списке pgsql-general по дате отправления: