Re: Bug in UTF8-Validation Code?

Поиск

Список

Период

Сортировка

От	Peter Eisentraut
Тема	Re: Bug in UTF8-Validation Code?
Дата	4 апреля 2007 г. 12:58:06
Msg-id	200704041757.58574.peter_e@gmx.net обсуждение исходный текст
Ответ на	Re: Bug in UTF8-Validation Code? (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

Am Mittwoch, 4. April 2007 16:22 schrieb Tom Lane:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Right -- IMHO what we should be doing is reject any input to chr() which
> > is beyond plain ASCII (or maybe > 255), and create a separate function
> > (unicode_char() sounds good) to get an Unicode character from a code
> > point, converted to the local client_encoding per conversion_procs.
>
> Hm, I hadn't thought of that approach, but another idea is that the
> argument of chr() is *always* a unicode code point, and it converts
> to the current encoding.  Do we really need a separate function?

The SQL standard has a "Unicode character string literal", which looks like 
this:

U&'The price is 100 \20AC.'

This is similar in spirit to our current escape mechanism available via E'...' 
which, however, produces bytes.  It has the advantage over a chr()-based 
mechanism that the composition of strings doesn't require an ugly chain of 
literals, functions, and concatenations.

Implementing this would, however, be a bit tricky because you don't have 
access to the encoding conversion functions in the lexer.  You would probably 
have to map that to a function call an evaluate it later.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Bug in UTF8-Validation Code?