RE: [INTERFACES] JDBC and character sets
От | Peter Mount |
---|---|
Тема | RE: [INTERFACES] JDBC and character sets |
Дата | |
Msg-id | 1B3D5E532D18D311861A00600865478C9FC9@exchange1.nt.maidstone.gov.uk обсуждение исходный текст |
Список | pgsql-interfaces |
If I understand this correctly, if I make sure the driver converts the strings (in the correct methods) into UTF-8, then unicode support will work? I'm wondering, as I haven't delved into Unicode with the driver yet. If this is the case, it will be a simple thing to implement. Peter -- Peter Mount Enterprise Support Maidstone Borough Council Any views stated are my own, and not those of Maidstone Borough Council. -----Original Message----- From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] Sent: Tuesday, June 22, 1999 3:14 PM To: David Warnock Cc: t-ishii@sra.co.jp; pgsql-interfaces@postgreSQL.org Subject: Re: [INTERFACES] JDBC and character sets >I think I saw from the list of developers that you wrote a lot of the >multiu-byte code. Is that correct? If so my grateful thanks. Yes, I'm responsible for the code multi-byte. >Are there any limitations or gotchas about using unicode everywhere? > >Specifically > >1. Column length. Is this measured in unicode characters or do I need to >increase the length of Varchars? ie is a varchar(10) certain to hold 10 >unicode characters? When you define varchar(n), n should be counted in bytes, not characters. We assume Unicode is input as UTF-8 encoding. In UTF-8, 10 ASCII chars take 10 bytes. So varchar(10) will hold 10 Unicode chars if they are all ASCII. However, if you use ISO8859 chars they will take 2 bytes for each letter. If you use KANJI, 3 bytes for each letter. You could use octet_length() to measure the size of a Unicode string in bytes. >2. Indexing. What sort order will I get from an index or an order by for >unicode characters. It will sorted in the order of Unicode code point. >Can this be customised. Currently no. >Generally I try to do any >really important sorting in Java where I can use the correct sort order >for the locale. >3. Upper/lowercase. I have been using separate columns for uppercase >versions of names etc again so that the case changes can be done by the >client which will know the correct rules for the locale where the data >is entered. What do upper/lower case functions in Postgresql do with >unicode? I think it will related to locale. I'm not sure but I've heard about Unicode locale. If it really exists, you could do: configure --with-mb=UNICODE --with-locale so that upper/lower works for Unicode. >4. Are there any limitations on what I use to write triggers? Can all >the different ways work reliably with unicode? I'm not sure but it should work with triggers. -- Tatsuo Ishii
В списке pgsql-interfaces по дате отправления: