Re: proposal: UTF8 to_ascii function
От | Andrew Dunstan |
---|---|
Тема | Re: proposal: UTF8 to_ascii function |
Дата | |
Msg-id | 48A041CC.1090703@dunslane.net обсуждение исходный текст |
Ответ на | Re: proposal: UTF8 to_ascii function (Jan Urbański <j.urbanski@students.mimuw.edu.pl>) |
Ответы |
Re: proposal: UTF8 to_ascii function
Re: proposal: UTF8 to_ascii function |
Список | pgsql-hackers |
Jan Urbański wrote: > Andrew Dunstan wrote: >> >> >> Pavel Stehule wrote: >>> >>> >>> One note - convert_to is correct. But we have to use to_ascii without >>> decode functions. It has same behave - convert from bytea to text. >>> Text in "incorrect" encoding is dafacto bytea. So correct to_ascii >>> function prototypes are: >>> >>> to_ascii(text) >>> to_ascii(bytea, integer); >>> to_ascii(bytea, name); >>> >>> >>>> >> >> What you have not said is how you propose to convert UTF8 to ASCII. >> >> Currently to_ascii() converts a small number of single byte charsets >> to ASCII by folding the chars with high bits set, so what we get is a >> pure ASCII result which is safe in any server encoding, as they are >> all ASCII supersets. >> >> But what conversion rule will you use for the gazillions of Unicode >> characters? >> >> I honestly do not understand the use case for this at all. > > I do. Often clients want their searches to be > accented-or-language-specific letters insensitive. So searching for > 'łódź' returns 'lodz'. So the use case is there (in fact, the lack of > such facility made me consider not upgrading particular client to > 8.3...). > Or maybe there's a better way to do it? Well, my first question would be "Why aren't you using a database encoding that supports to_ascii()?" However, I suppose that your use case would support this signature: to_ascii(bytea, name) where it would just error out if the encoding name were something other than LATIN1, LATIN2, LATIN9, or WIN1250. But what would be the meaning of this?: to_ascii(bytea, integer) cheers andrew
В списке pgsql-hackers по дате отправления: