Re: Hex characters in COPY input
От | Adrian Klaver |
---|---|
Тема | Re: Hex characters in COPY input |
Дата | |
Msg-id | 54F0964F.4030003@aklaver.com обсуждение исходный текст |
Ответ на | Re: Hex characters in COPY input (Melvin Call <melvincall979@gmail.com>) |
Список | pgsql-general |
On 02/27/2015 07:55 AM, Melvin Call wrote: > On Fri, Feb 27, 2015 at 9:03 AM, Adrian Klaver > <adrian.klaver@aklaver.com> wrote: >> On 02/27/2015 06:39 AM, Melvin Call wrote: >>> >>> On 2/26/15, Vick Khera <vivek@khera.org> wrote: >>>> >>>> On Thu, Feb 26, 2015 at 9:50 PM, Melvin Call <melvincall979@gmail.com> >>>> wrote: >>>> >>>>> Montreal where the e is an accented e. The output ends up in the text >>>>> file >>>>> as >>>>> Montr\xe9al, where the xe9 is a single character. When I try to copy >>>>> that >>>>> into >>>>> my PostgreSQL table, I get an error "ERROR: invalid byte sequence for >>>>> encoding >>>>> >>>> >>>> Character code E9 is not UTF8. Don't tell Postgres you're importing UTF8 >>>> if >>>> you're not. >>>> >>> Thank you Vic, adding the ENCODING 'latin1' option to the COPY command >>> worked >>> perfectly. >>> >>> If you don't mind a follow up to your reply, I have tried to understand >>> the >>> different character sets and collations, but I guess I still have a lot to >>> learn. Your suggestion did not even come close to crossing my mind because >>> the >>> MySQL table and database are encoded in UTF8. I assume the conversion to >>> latin1 >>> happened because I was putting the MySQL query output into a locally >>> stored >>> text file? Regardless, can you point me to some reading that would have >>> clued >>> me in that e9 is not a UTF8 character? Or is the clue the fact that it was >>> not >>> preceeded with 0x00? >> >> >> For UTF8 characters see here: >> >> http://www.utf8-chartable.de/ > > Thank you for the link. Bookmarked. > >> >> >> For the MySQL part, you are going to detail how you got the data out? > > This is in preparation of moving away from MySQL. I inherited this MySQL > database and it is in horrible shape, no referential integrity, no constraints > other than arbitrarily chosen VARCHAR lengths, no indexes, and inconsistent > entity and attribute naming. I have to pull the fields out through a query that > is being redirected to a local file, as opposed to a dump, because I'm having > to filter out a lot of useless rows (TestCity in a production system!?). I just > realized I could have put the usable data into similar tables and then used > mysqldump with the encoding specified, but no need now. I have my extraction > and import done, so this should be the last I need to touch the MySQL system. Gotcha, I recently did something similar. > >> >>> >>> Regards, >>> Melvin >>> >>> >> >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com > > -- Adrian Klaver adrian.klaver@aklaver.com
В списке pgsql-general по дате отправления: