Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text
От | Bart Samwel |
---|---|
Тема | Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text |
Дата | |
Msg-id | 442DD6E2.5070500@samwel.tk обсуждение исходный текст |
Ответ на | Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text (Marc Herbert <Marc.Herbert@continuent.com>) |
Список | pgsql-odbc |
Marc Herbert wrote: > Johann Zuschlag <zuschlag2@online.de> writes: > >> I've read about the problems with the NULL bytes on Unix machines. > > This problem is not related to Unix at all but to the programming > language used. Most standard C functions use the zero byte convention > as a string terminator, so it becomes a forbidden character in C. > > On the other hand String objects in C++ and Java use a separate length > field, and having NULLs inside a string is a no brainer there. > > The ODBC API has been designed for C and Cobol. Cobol does not forbid > zero as a character either. When browsing the ODBC spec you'll notice > it carefully caters for the two ways. > > > Guess which programming language is used PostgreSQL. C++ even introduced a special alternative character type "wchar_t" for this, just so that people could handle both 8-bit char* and 16-bit wchar_t* strings. In wchar_t* strings, 8-bit NULs are not a problem because only 16-bit NULs count (and AFAIK the Unicode standard does allows this to be interpreted as a NUL aka end-of-string). The downside of this solution is that no application actually uses it, and everybody is stuck with 8-bit ASCII plus a random local codepage unless special support is added. Why didn't they just upgrade chars to 32 bits and be done with it... :-/ Cheers, Bart
В списке pgsql-odbc по дате отправления: