Re: Character sets (Re: Re: Big 7.1 open items)
От | Tatsuo Ishii |
---|---|
Тема | Re: Character sets (Re: Re: Big 7.1 open items) |
Дата | |
Msg-id | 20000621151917D.t-ishii@sra.co.jp обсуждение исходный текст |
Ответ на | Character sets (Re: Re: Big 7.1 open items) (Peter Eisentraut <peter_e@gmx.net>) |
Список | pgsql-hackers |
> But how are you going to tell a genuine "type" from a character set? And > you might have to have three types for each charset. There'd be a lot of > redundancy and confusion regarding the input and output functions and > other pg_type attributes. No doubt there's something to be learned from > the type system, but character sets have different properties -- like > characters(!), collation rules, encoding "translations" and what not. > There is no doubt also need for different error handling. So I think that > just dumping every character set into pg_type is not a good idea. That's > almost equivalent to having separate types for char(6), char(7), etc. > > Instead, I'd suggest that character sets become separate objects. A > character entity would carry around its character set in its header > somehow. Consider a string concatenation function, being invoked with two > arguments of the same exotic character set. Using the type system only > you'd have to either provide a function signature for all combinations of > characters sets or you'd have to cast them up to SQL_TEXT, concatenate > them and cast them back to the original charset. A smarter concatentation > function instead might notice that both arguments are of the same > character set and simply paste them together right there. Intersting idea. But what about collations? SQL allows to assign a collation different from the default one to a character set on the fly. Should we make collations as separate obejcts as well? > Here are a couple of "items" I keep wondering about: > > * To what extend would we be able to use the operating systems locale > facilities? Besides the fact that some systems are deficient or broken one > way or another, POSIX really doesn't provide much besides "given two > strings, which one is greater", and then only on a per-process basis. > We'd really need more that, see also LIKE indexing issues, and indexing in > general. Correct. I'd suggest completely getting ride of OS's locale. > * Client support: A lot of language environments provide pretty smooth > Unicode support these days, e.g., Java, Perl 5.6, and I think that C99 has > also made some strides. So while "we can store stuff in any character set > you want" is great, it's really no good if it doesn't work transparently > with the client interfaces. At least something to keep in mind. Do you suggest that we should convert everyting into Unicode and store them into DB? -- Tatsuo Ishii
В списке pgsql-hackers по дате отправления: