Re: How to find freak UTF-8 character?
От | pasman pasmański |
---|---|
Тема | Re: How to find freak UTF-8 character? |
Дата | |
Msg-id | CAOWY8=ZXtzGrshy8Fe3v3RD0TwFsQiq_eSCY+3HWTt5EbN9irw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: How to find freak UTF-8 character? (Leif Biberg Kristensen <leif@solumslekt.org>) |
Ответы |
Re: How to find freak UTF-8 character?
(Leif Biberg Kristensen <leif@solumslekt.org>)
|
Список | pgsql-general |
Its simple to remove strange chars with regex_replace. 2011/10/1, Leif Biberg Kristensen <leif@solumslekt.org>: > On Saturday 1. October 2011 21.29.45 Andrew Sullivan wrote: >> I see you found it, but note that it's _not_ a spurious UTF-8 >> character: it's a right-to-left mark, ans is a perfectly ok UTF-8 code >> point. > > Andrew, > thank you for your reply. Yes I know that this is a perfectly legal UTF-8 > character. It crept into my database as a result of a copy-and-paste job > from > a web site. The point is that it doesn't have a counterpart in ISO-8859-1 to > which I regularly have to export the data. > > The offending character came from this URL: > <http://www.soge.kviteseid.no/individual.php?pid=I2914&ged=Kviteseid.GED&tab=0> > > and the text that I copied and pasted from the page looks like this in the > source code: > > Aslaug Steinarsdotter Fjågesund (I2914) > > I'm going to write to the webmaster of the site and ask why that character, > represented in the HTML as the entity, has to appear in a Norwegian > web > site which never should have to display text in anything but left-to-right > order. > >> If you need a subset of the UTF-8 character set, you want to make sure >> you have some sort of constraint in your application or your database >> that prevents insertion of anything at all in UTF-8. This is a need >> people often forget when working in an internationalized setting, >> because there's a lot of crap that comes from the client side in a >> UTF-8 setting that might not come in other settings (like LATIN1). > > I don't want any constraint of that sort. I'm perfectly happy with UTF-8. > And > now that I've found out how to spot problematic characters that will crash > my > export script, it's really not an issue anymore. The character didn't print > neither in psql nor in my PHP frontend, so I just removed the problematic > text > and re-entered it by hand. Problem solved. > > But thank you for the idea, I think that I will strip out at least any > entities from text entered into the database. > > By the way, is there a setting in psql that will output unprintable > characters > as question marks or something? > > regards, Leif. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- ------------ pasman
В списке pgsql-general по дате отправления: