Re: Add ENCODING option to COPY
От | Hitoshi Harada |
---|---|
Тема | Re: Add ENCODING option to COPY |
Дата | |
Msg-id | AANLkTi=eAtrf06WLCRTyM=KZsL41R=UoVT4QDECc7G+V@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Add ENCODING option to COPY (Itagaki Takahiro <itagaki.takahiro@gmail.com>) |
Ответы |
Re: Add ENCODING option to COPY
|
Список | pgsql-hackers |
2011/1/25 Itagaki Takahiro <itagaki.takahiro@gmail.com>: > On Sat, Jan 15, 2011 at 02:25, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >> The patch overrides client_encoding by the added ENCODING option, and >> restores it as soon as copy is done. > > We cannot do that because error messages should be encoded in the original > encoding even during COPY commands with encoding option. Error messages > could contain non-ASCII characters if lc_messages is set. Agreed. >> I see some complaints ask to use >> pg_do_encoding_conversion() instead of >> pg_client_to_server/server_to_client(), but the former will surely add >> slight overhead per reading line > > If we want to reduce the overhead, we should cache the conversion procedure > in CopyState. How about adding something like "FmgrInfo file_to_server_covv" > into it? I looked down to the code and found that we cannot pass FmgrInfo * to any functions defined in pg_wchar.h, since the header file is shared in libpq, too. For the record, I also tried pg_do_encoding_conversion() instead of pg_client_to_server/server_to_client(), and the simple benchmark shows it is too slow. with 3000000 lines with 3 columns (~22MB tsv) COPY FROM *utf8 -> utf8 (no conversion) 13428.233ms 13322.832ms 15661.093ms *euc_jp -> utf8 (client_encoding) 17527.470ms 16457.452ms 16522.337ms *euc_jp -> utf8 (pg_do_encoding_conversion) 20550.983ms 21425.313ms 20774.323ms I'll check the code more if we have better alternatives. Regards, -- Hitoshi Harada
В списке pgsql-hackers по дате отправления: