Re: converting databases form SQL_ASCII to UTF8
От | Steve Crawford |
---|---|
Тема | Re: converting databases form SQL_ASCII to UTF8 |
Дата | |
Msg-id | 4DB1CA80.1050103@pinpointresearch.com обсуждение исходный текст |
Ответ на | converting databases form SQL_ASCII to UTF8 (Geoffrey Myers <lists@serioustechnology.com>) |
Список | pgsql-general |
On 04/22/2011 08:00 AM, Geoffrey Myers wrote: > We are moving our databases to new hardware soon, so we felt it would > be a good time to get the encoding correct. Our databases are > currently SQL_ASCII and we plan to move them to UTF8. We are in the same boat, fortunately only on one older server we are upgrading and fortunately for internal apps.... > > > So, as previously noted, there are certain characters that won't load > into a UTF8 database from a dump of the SQL_ASCII database. > > Here's our problem. We planned on moving databases a few at a time. > Problem is, there is a process that pushes data from one database to > another. If this process attempts to push data from a SQL_ASCII > database to a new UTF8 database and it has one of these characters > mentioned above, the process fails. > > So, now the question is, is this effort even worth our effort? > > What is the harm in leaving our databases SQL_ASCII encoded? SQL_ASCII is a synonym for "no encoding." You put in a stream of bytes and that's what you get out. That's OK if the byte-stream has exactly the same meaning to every application and user. If that's not the case then you have bytes in your database but you don't know what those bytes are supposed to represent. In a way, it's like having a generic integer column but depending on the user or the application, that column might represent a unix epoch timestamp, an age in years, a salary, a weight in grams, furlongs per fortnight, etc. And there is no indicator to say which it is. Not good. We are in the final stages of cleaning up our last bit of non-utf8 data and the above some what silly example is actually not far from the truth. Due to data that arrived from web-inputs, spreadsheet imports, command-line, internal-apps, etc. we have been faced with cleaning tables where one row has only basic ASCII data, another has UTF8, while others have various different Microsoft encodings. With data like that it's pretty much impossible to guarantee that even a simple web-report will display fully correctly. The longer you wait, the worse it gets. Even though it is only a tiny percentage of our data, cleaning it is still a pain. Is it worth it? Dunno - you have to calculate the cost/benefit. For us it was a no-brainer to bite-the-bullet and do it. Cheers, Steve
В списке pgsql-general по дате отправления: