Re: regexp_replace and UTF8
От | Jasen Betts |
---|---|
Тема | Re: regexp_replace and UTF8 |
Дата | |
Msg-id | gm1doi$914$1@reversiblemaps.ath.cx обсуждение исходный текст |
Ответ на | regexp_replace and UTF8 ("Bart Degryse" <Bart.Degryse@indicator.be>) |
Ответы |
Re: regexp_replace and UTF8
|
Список | pgsql-sql |
On 2009-01-30, Bart Degryse <Bart.Degryse@indicator.be> wrote: > > --=__Part8EA648F8.0__= > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: quoted-printable > > Hi, > I have a text field with data like this: 'de patiënt niet' > (without the quotes). > I would like to convert this string to look like this: 'de patiënt > niet' > Basically what I need to do (I think) is > - get rid of the &, # and ; > - convert the number to hex > - make a UTF8 from that (thus: \xEB) that is not UTF8. the UTF8 representation for 'ë' in SQL is e'\xC3\xAB' or chr(235) your input appears to be encoded in LATIN-1 (or possibly 8859-13) I think you'll need to write a function. here, where I have database encoding UTF8 this appears to work as you desire. CREATE or replace FUNCTION htmlent(inp text) returns text as $f$ DECLARE str text; BEGINstr= regexp_replace(quote_literal( inp) ,$$(\d+);$$ ,$$'||chr(\1)||'$$ ,'g');execute'select '||str into str;return str; END $f$ LANGUAGE PLPGSQL; select htmlent('de patiënt niet'); probably the above should be expanded to handle named entities like '&' too. characters outside of the LATIN-1 space are not handled but aparently this is what you want.
В списке pgsql-sql по дате отправления: