Re: BUG #4622: xpath only work in utf-8 server encoding
От | eshkinkot |
---|---|
Тема | Re: BUG #4622: xpath only work in utf-8 server encoding |
Дата | |
Msg-id | 9ea8622b0902072142u76c86c30q8b433182e8cb0800@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #4622: xpath only work in utf-8 server encoding (Peter Eisentraut <peter_e@gmx.net>) |
Список | pgsql-bugs |
23 января 2009 г. 0:58 пользователь Peter Eisentraut <peter_e@gmx.net> написал: > On Thursday 22 January 2009 15:39:00 Sergey Burladyan wrote: >> seb=# select xpath('/русский/text()', v::xml) from (select >> xml('<русский>язык</русский>')) as x(v); >> ERROR: could not parse XML data >> DETAIL: Entity: line 1: parser error : Input is not proper UTF-8, indicate >> encoding ! >> Bytes: 0xF0 0xF3 0xF1 0xF1 >> <x><русский>язык</русский></x> >> ^ > This raises the question: What are the rules about encoding the characters in > XPath expressions themselves? I haven't found anything about that in the > standard. Anyone know? PostgreSQL does not use libxml2 internal encoding support and strip xml encoding from xml body, so i think there is no choice, by default for libxml2 it must be in it internal encoding utf-8 anyway. i am not sure about xml standard but may be documentation of libxml2 can help to solve this issue ? see http://xmlsoft.org/encoding.html "What does this mean in practice for the libxml2 user: * xmlChar, the libxml2 data type is a byte, those bytes must be assembled as UTF-8 valid strings. The proper way to terminate an xmlChar * string is simply to append 0 byte, as usual. * One just need to make sure that when using chars outside the ASCII set, the values has been properly converted to UTF-8" I understand this as: all xmlChar strings must be in utf-8 encoding, no matter what is encoding of xml body i try to fix this issue for xpath function, see patch in attachment by the way, contrib/xml2 also have this issue...
Вложения
В списке pgsql-bugs по дате отправления: