Re: UTF8 with BOM support in psql

Поиск

Список

Период

Сортировка

От	Peter Eisentraut
Тема	Re: UTF8 with BOM support in psql
Дата	20 октября 2009 г. 06:54:55
Msg-id	1256032481.9382.19.camel@fsopti579.F-Secure.com обсуждение исходный текст
Ответ на	UTF8 with BOM support in psql (Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, 2009-10-20 at 14:41 +0900, Itagaki Takahiro wrote:
> UTF8 encoding text files with BOM (Byte Order Mark) are commonly
> used in Windows, though BOM was designed for UTF16 text originally.
> However, psql cannot read such format even if we set client encoding
> to UTF8. Is it worth supporting those format in psql?

psql doesn't have a problem, but the backend's lexer doesn't parse the
BOM as whitespace.  Since the lexer is byte-based, it will presumably
have problems with anything outside of ASCII that Unicode considers
whitespace.

> When psql opens a file with -f or \i, it checks first 3 bytes of the
> file. If they are BOM, discard the 3 bytes and change client encoding
> to UTF8 automatically.

While I see that the Unicode standard supports using a UTF-8 encoded BOM
as UTF-8 signature, I wonder if those bytes can usefully appear in a
leading position in other encodings.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: UTF8 with BOM support in psql