Re: UTF8 with BOM support in psql
От | Peter Eisentraut |
---|---|
Тема | Re: UTF8 with BOM support in psql |
Дата | |
Msg-id | 1256032481.9382.19.camel@fsopti579.F-Secure.com обсуждение исходный текст |
Ответ на | UTF8 with BOM support in psql (Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>) |
Список | pgsql-hackers |
On Tue, 2009-10-20 at 14:41 +0900, Itagaki Takahiro wrote: > UTF8 encoding text files with BOM (Byte Order Mark) are commonly > used in Windows, though BOM was designed for UTF16 text originally. > However, psql cannot read such format even if we set client encoding > to UTF8. Is it worth supporting those format in psql? psql doesn't have a problem, but the backend's lexer doesn't parse the BOM as whitespace. Since the lexer is byte-based, it will presumably have problems with anything outside of ASCII that Unicode considers whitespace. > When psql opens a file with -f or \i, it checks first 3 bytes of the > file. If they are BOM, discard the 3 bytes and change client encoding > to UTF8 automatically. While I see that the Unicode standard supports using a UTF-8 encoded BOM as UTF-8 signature, I wonder if those bytes can usefully appear in a leading position in other encodings.
В списке pgsql-hackers по дате отправления: