Re: Support UTF-8 files with BOM in COPY FROM
От | Brar Piening |
---|---|
Тема | Re: Support UTF-8 files with BOM in COPY FROM |
Дата | |
Msg-id | 4E816406.1050001@gmx.de обсуждение исходный текст |
Ответ на | Re: Support UTF-8 files with BOM in COPY FROM (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Support UTF-8 files with BOM in COPY FROM
|
Список | pgsql-hackers |
<span id="IDstID">Tom Lane wrote:</span><blockquote cite="mid:29877.1317066533@sss.pgh.pa.us" type="cite"><pre wrap=""> Note that the reference to byte order betrays the implicit context assumption: that we're talking about UTF16 or UTF32 representation.</pre></blockquote> Note that there is no implicit contextassumption in the Unicode FAQ. It's equally covering UTF-8, UTF-16 and UTF-32.<br /> Another quote:<br /> Q: Can aUTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8 bytes arein big-endian order?<br /> A: Yes, UTF-8 can contain a BOM. However, it makes <i>no</i> difference as to the endiannessof the byte stream. UTF-8 always has the same byte order. An initial BOM is <i>only</i> used as a signature — anindication that an otherwise unmarked text file is in UTF-8. Note that some recipients of UTF-8 encoded data do not expecta BOM. Where UTF-8 is used<i> transparently</i> in 8-bit environments, the use of a BOM will interfere with any protocolor file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the beginningof Unix shell scripts.<br /><blockquote cite="mid:29877.1317066533@sss.pgh.pa.us" type="cite"><pre wrap=""> BOM is useless in UTF8, no matter what Microsoft thinks. Any tool that relies on it to detect UTF8 data has to have a workaround for overriding that detection, or it's broken to the point of uselessness.</pre></blockquote> This kind of brokenness is currently existingthe other way around (see my reference to the perl script I' using to work aound it).<br /><br /> Note also thatI'm not citing a Microsoft FAQ but the Unicode FAQ.<br /> I'm also not trying to convert Postgres into a Microsoft tool(I'm pretty happy it isn't) but I'm pointing to existing compatibility issues on a Platform that others have decidedto support.<br /> Belonging to the huge group of users who have little or no choice in what OS they are using andbeing from a country where plain ASCII isn't enough to cover all existing characters this is probably fair.<br /><br />It's a pity that the Unicode standard actually allows something that can cause problems but blaming the non-platform againdoesn't solve the existing issues.<br /><br /> Regards,<br /><br /> Brar<br />
В списке pgsql-hackers по дате отправления: