Re: UTF8 with BOM support in psql
От | Peter Eisentraut |
---|---|
Тема | Re: UTF8 with BOM support in psql |
Дата | |
Msg-id | 1258441337.10724.13.camel@fsopti579.F-Secure.com обсуждение исходный текст |
Ответ на | Re: UTF8 with BOM support in psql (Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>) |
Ответы |
Re: UTF8 with BOM support in psql
Re: UTF8 with BOM support in psql |
Список | pgsql-hackers |
On tis, 2009-11-17 at 14:19 +0900, Itagaki Takahiro wrote: > The attachd patch is a new proposal of the feature. > When we found BOM at beginning of file, set "expected_encoding" to UTF8. > Before every execusion of query, if pset.encoding is not UTF8, we check the > query string not to contain any non-ASCII characters and throw an error if > found. Encoding declarations are typically written only in ascii characters, > so we can postpone encoding checking until non-ascii characters appear. > > Since the default value of expected_encoding is SQL_ASCII, that pass > through all characters, so the patch does nothing to scripts without BOM. > (There are no codes to set expected_encoding except BOM.) > If client encoding is UTF8, it skips BOM and no effect to the script body. > BOMs are skipped even if client encoding is not set to UTF8, but can throw > an error if there are no explicit encoding declaration. I think I could support using the presence of the BOM as a fall-back indicator of encoding in absence of any other declaration. It seems to me, however, that the description above ignores the existence of encodings other than SQL_ASCII and UTF8. Also, when the proposed patch to set the encoding from the locale appears, we need to make this logic more precise. Something like: 1. set client_encoding or \encoding, otherwise 2. if BOM found, then UTF8, otherwise 3. by locale environment, otherwise 4. SQL_ASCII (= server encoding, effectively) Also, I'm not sure if we need this logic only when we send a query. It might be better to do this in the lexer when we find a non-ASCII character and we don't have a client encoding != SQL_ASCII set yet.
В списке pgsql-hackers по дате отправления: