Re: New to PostgreSQL, performance considerations
От | Daniel van Ham Colchete |
---|---|
Тема | Re: New to PostgreSQL, performance considerations |
Дата | |
Msg-id | 8a0c7af10612110232y2fb416ffpdfa70f1b492388ae@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: New to PostgreSQL, performance considerations (Alexander Staubo <alex@purefiction.net>) |
Список | pgsql-performance |
On 12/11/06, Alexander Staubo <alex@purefiction.net> wrote: > On Dec 11, 2006, at 02:47 , Daniel van Ham Colchete wrote: > > > I never understood what's the matter between the ASCII/ISO-8859-1/UTF8 > > charsets to a database. They're all simple C strings that doesn't have > > the zero-byte in the midlle (like UTF16 would) and that doesn't > > require any different processing unless you are doing case insensitive > > search (them you would have a problem). > > That's not the whole story. UTF-8 and other variable-width encodings > don't provide a 1:1 mapping of logical characters to single bytes; in > particular, combination characters opens the possibility of multiple > different byte sequences mapping to the same code point; therefore, > string comparison in such encodings generally cannot be done at the > byte level (unless, of course, you first acertain that the strings > involved are all normalized to an unambiguous subset of your encoding). > > PostgreSQL's use of strings is not limited to string comparison. > Substring extraction, concatenation, regular expression matching, up/ > downcasing, tokenization and so on are all part of PostgreSQL's small > library of text manipulation functions, and all deal with logical > characters, meaning they must be Unicode-aware. > > Alexander. > You're right. I was thinking only about my cases that takes the Unicode normatization for granted and doesn't use regexp/tokenization/... Thanks Best Daniel
В списке pgsql-performance по дате отправления: