BUG #2010: COPY command does not recognise UTF-8 text files with leading BOM
От | Roddi Walker |
---|---|
Тема | BUG #2010: COPY command does not recognise UTF-8 text files with leading BOM |
Дата | |
Msg-id | 20051031023400.9563CF0BAB@svr2.postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #2010: COPY command does not recognise UTF-8 text files with leading BOM
|
Список | pgsql-bugs |
The following bug has been logged online: Bug reference: 2010 Logged by: Roddi Walker Email address: roddiwalker@yahoo.com PostgreSQL version: 8.1 beta 4 Operating system: Win 2000 Professional Description: COPY command does not recognise UTF-8 text files with leading BOM Details: 1) Created a UTF-8 database "foo", with a table "bar": CREATE TABLE bar ( mycol text ); 2) Used Notepad created a UTF-8 "bar.txt" text file with just the word "fred" in it. When writing a UTF-8 file, Notepad writes a 3-byte Byte Order Mark (BOM) header of hex EF BB BF. So the file's 7 hex bytes were: EF BB BF 66 72 65 64. This BOM header is legal - see http://www.unicode.org/faq/utf_bom.html#BOM - but probably used only on Windows. 3) in PSQL, populated table "bar" from file "bar.txt" using: copy bar from 'c:\\bar.txt'; 4) THE BUG: postgresql doesn't recognise the EF BB BF bytes as a BOM header and skip it. Instead it treats the 3 bytes as a unicode character which pgAdminIII renders as a hollow square when the table data is viewed. That is, table data rendered as "[]fred" (where "[]" is the hollow box). 5) SUGGESTED SOLUTION: I'm not a unicode expert, so I don't know if the BOM can be safely skipped in all cases (although it probably can for UFT-8 text files). But at least a COPY option SKIPBOM (or some-such).
В списке pgsql-bugs по дате отправления: