Re: Upcoming PG re-releases
От | Martijn van Oosterhout |
---|---|
Тема | Re: Upcoming PG re-releases |
Дата | |
Msg-id | 20051209161733.GC20352@svana.org обсуждение исходный текст |
Ответ на | Re: Upcoming PG re-releases (Gregory Maxwell <gmaxwell@gmail.com>) |
Ответы |
Re: Upcoming PG re-releases
|
Список | pgsql-hackers |
On Thu, Dec 08, 2005 at 05:54:35PM -0500, Gregory Maxwell wrote: > No, what is needed for people who care about fixing their data is a > loadable strip_invalid_utf8() that works in older versions.. then just > select * from bar where foo != strip_invalid_utf8(foo); The function > would be useful in general, for example, if you have an application > which doesn't already have much utf8 logic, you want to use a text > field, and stripping is the behaviour you want. For example, lots of > simple web applications. Would something like the following work? It's written in pl/pgsql and does (AFAICS) the same checking as the backend in recent releases. Except the backend only supports up to 4-byte UTF-8 whereas this function checks upto six byte. For a six byte UTF-8 character, who is wrong? In any case, people should be able to do something like: SELECT field FROM table WHERE NOT utf8_verify(field,4); To check conformance with PostgreSQL 8.1. Note, I don't have large chunks of UTF-8 to test with but it works for the characters I tried with. Tested with 7.4. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
В списке pgsql-hackers по дате отправления: