Re: Bug in UTF8-Validation Code?
От | Peter Eisentraut |
---|---|
Тема | Re: Bug in UTF8-Validation Code? |
Дата | |
Msg-id | 200704041757.58574.peter_e@gmx.net обсуждение исходный текст |
Ответ на | Re: Bug in UTF8-Validation Code? (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Am Mittwoch, 4. April 2007 16:22 schrieb Tom Lane: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Right -- IMHO what we should be doing is reject any input to chr() which > > is beyond plain ASCII (or maybe > 255), and create a separate function > > (unicode_char() sounds good) to get an Unicode character from a code > > point, converted to the local client_encoding per conversion_procs. > > Hm, I hadn't thought of that approach, but another idea is that the > argument of chr() is *always* a unicode code point, and it converts > to the current encoding. Do we really need a separate function? The SQL standard has a "Unicode character string literal", which looks like this: U&'The price is 100 \20AC.' This is similar in spirit to our current escape mechanism available via E'...' which, however, produces bytes. It has the advantage over a chr()-based mechanism that the composition of strings doesn't require an ugly chain of literals, functions, and concatenations. Implementing this would, however, be a bit tricky because you don't have access to the encoding conversion functions in the lexer. You would probably have to map that to a function call an evaluate it later. -- Peter Eisentraut http://developer.postgresql.org/~petere/
В списке pgsql-hackers по дате отправления: