Re: Support UTF-8 files with BOM in COPY FROM
От | Andrew Dunstan |
---|---|
Тема | Re: Support UTF-8 files with BOM in COPY FROM |
Дата | |
Msg-id | 4E806AB2.5090200@dunslane.net обсуждение исходный текст |
Ответ на | Re: Support UTF-8 files with BOM in COPY FROM (Magnus Hagander <magnus@hagander.net>) |
Список | pgsql-hackers |
On 09/26/2011 07:12 AM, Magnus Hagander wrote: > On Mon, Sep 26, 2011 at 06:58, Itagaki Takahiro > <itagaki.takahiro@gmail.com> wrote: >> Hi, >> >> I'd like to support UTF-8 text or csv files that has BOM (byte order mark) >> in COPY FROM command. BOM will be automatically detected and ignored >> if the file encoding is UTF-8. WIP patch attached. >> >> I'm thinking about only COPY FROM for reads, but if someone wants to add >> BOM in COPY TO, we might also support COPY TO WITH BOM for writes. >> >> Comments welcome. > I like it in general. But if we're looking at the BOM, shouldn't we > also look and *reject* the file if it's a BOM for a non-UTF8 file? Say > if the BOM claims it's UTF16? > It should be rejected as invalidly encoded anyway, as a non-utf8 BOM is not valid utf-8. We shouldn't check in non-unicode cases where the sequence might be valid in those encodings (e.g. ISO-8859-1). cheers andrew
В списке pgsql-hackers по дате отправления: