Re: Re: LIKE gripes
От | Thomas Lockhart |
---|---|
Тема | Re: Re: LIKE gripes |
Дата | |
Msg-id | 39940A27.EA03F12D@alumni.caltech.edu обсуждение исходный текст |
Ответ на | Re: Re: LIKE gripes (Thomas Lockhart <lockhart@alumni.caltech.edu>) |
Ответы |
Re: Re: LIKE gripes
(Tatsuo Ishii <t-ishii@sra.co.jp>)
|
Список | pgsql-hackers |
> > I think I have a solution for the current code; could someone test its > > behavior with MB enabled? It is now committed to the source tree; I know > > it compiles, but afaik am not equipped to test it :( > It passed the MB test, but fails the string test. Yes, I know it fails > becasue ILIKE for MB is not implemented (yet). I'm looking forward to > implement the missing part. Is it ok for you, Thomas? Whew! I'm glad "fails the string test" is because of the ILIKE/tolower() issue; I was afraid you would say "... because Thomas' bad code dumps core..." :) Yes, feel free to implement the missing parts. I'm not even sure how to do it! Do you think it would be best in the meantime to disable the ILIKE tests, or perhaps to separate that out into a different test? > Please note that existing MB implementation does not need such an > extra conversion cost except some MB-aware-functions(text_length > etc.), regex, like and the input/output stage. Also MB stores native > encodings 'as is' onto the disk. Yes. I am probably getting a skewed view of MB since the LIKE code is an edge case which illustrates the difficulties in handling character sets in general no matter what solution is used. > Anyway, it looks like MB would eventually be merged into/deplicated by > your new implementaion of multiple encodings support. I've started writing up a description of my plans (based on our previous discussions), and as I do so I appreciate more and more your current solution ;) imho you have solved several issues such as storage format, client/server communication, and mixed-encoding comparison and manipulation which would all need to be solved by a "new implementation". My current thought is to leave MB intact, and to start implementing "character sets" as distinct types (I know you have said that this is a lot of work, and I agree that is true for the complete set). Once I have done one or a few character sets (perhaps using a Latin subset of Unicode so I can test it by converting between ASCII and Unicode using character sets I know how to read ;) then we can start implementing a "complete solution" for those character sets which includes character and string comparison building blocks like "<", ">", and "tolower()", full comparison functions, and conversion routines between different character sets. But that by itself does not solve, for example, client/server encoding issues, so let's think about that again once we have some "type-full" character sets to play with. The default solution will of course use MB to handle this. > BTW, Thomas, do you have a plan to support collation functions? Yes, that is something that I hope will come out naturally from a combination of SQL9x language features and use of the type system to handle character sets. Then, for example (hmm, examples might be better in Japanese since you have such a rich mix of encodings ;), CREATE TABLE t1 (name TEXT COLLATE francais); will (or might ;) result in using the "francais" data type for the name column. SELECT * FROM t1 WHERE name < _FRANCAIS 'merci'; will use the "francais" data type for the string literal. And CREATE TABLE t1 (name VARCHAR(10) CHARACTER SET latin1 COLLATE francais); will (might?) use, say, the "latin1_francais" data type. Each of these data types will be a loadable module (which could be installed into template1 to make them available to every new database), and each can reuse underlying support routines to avoid as much duplicate code as possible. Maybe there would be defined a default encoding for a type, say "latin1" for "francais", so that the backend or some external scripts can help set these up. There is a good chance we will need (yet another) system table to allow us to tie these types into character sets and collations; otherwise Postgres might not be able to recognize that a type is implementing these language features and, for example, pg_dump might not be able to reconstruct the correct table creation syntax. I notice that SQL99 has *a lot* of new specifics on character set support, which prescribe things like CREATE COLLATION... and DROP COLLATION... This means that there is less thinking involved in the syntax but more work to make those exact commands fit into Postgres. SQL92 left most of this as an exercise for the reader. I'd be happier if we knew this stuff *could* be implemented by seeing another DB implement it. Are you aware of any that do (besides our own of course)? - Thomas
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: Identified a problem in pg_dump with serial data type and mixed case
Следующее
От: Tom LaneДата:
Сообщение: Re: CREATE INDEX test_idx ON test (UPPER(varchar_field)) doesn't work...