Re: Support UTF-8 files with BOM in COPY FROM

Поиск

Список

Период

Сортировка

От	Brar Piening
Тема	Re: Support UTF-8 files with BOM in COPY FROM
Дата	27 сентября 2011 г. 02:50:12
Msg-id	4E816406.1050001@gmx.de обсуждение исходный текст
Ответ на	Re: Support UTF-8 files with BOM in COPY FROM (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Support UTF-8 files with BOM in COPY FROM
Список	pgsql-hackers

Дерево обсуждения

<span id="IDstID">Tom Lane wrote:</span><blockquote cite="mid:29877.1317066533@sss.pgh.pa.us" type="cite"><pre
wrap="">
Note that the reference to byte order betrays the implicit context
assumption: that we're talking about UTF16 or UTF32 representation.</pre></blockquote> Note that there is no implicit
contextassumption in the Unicode FAQ. It's equally covering UTF-8, UTF-16 and UTF-32.<br /> Another quote:<br /> Q: Can
aUTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8 bytes
arein big-endian order?<br /> A: Yes, UTF-8 can contain a BOM. However, it makes <i>no</i> difference as to the
endiannessof the byte stream. UTF-8 always has the same byte order. An initial BOM is <i>only</i> used as a signature —
anindication that an otherwise unmarked text file is in UTF-8. Note that some recipients of UTF-8 encoded data do not
expecta BOM. Where UTF-8 is used<i> transparently</i> in 8-bit environments, the use of a BOM will interfere with any
protocolor file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the
beginningof Unix shell scripts.<br /><blockquote cite="mid:29877.1317066533@sss.pgh.pa.us" type="cite"><pre wrap="">
 

BOM is useless in UTF8, no matter what Microsoft thinks.  Any tool that
relies on it to detect UTF8 data has to have a workaround for overriding
that detection, or it's broken to the point of uselessness.</pre></blockquote> This kind of brokenness is currently
existingthe other way around (see my reference to the perl script I' using to work aound it).<br /><br /> Note also
thatI'm not citing a Microsoft FAQ but the Unicode FAQ.<br /> I'm also not trying to convert Postgres into a Microsoft
tool(I'm pretty happy it isn't) but I'm pointing to existing compatibility issues on a Platform that others have
decidedto support.<br /> Belonging to the huge group of users who have little or no choice in what OS they are using
andbeing from a country where plain ASCII isn't enough to cover all existing characters this is probably fair.<br /><br
/>It's a pity that the Unicode standard actually allows something that can cause problems but blaming the non-platform
againdoesn't solve the existing issues.<br /><br /> Regards,<br /><br /> Brar<br />

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Support UTF-8 files with BOM in COPY FROM