Re: Add ENCODING option to COPY

Поиск

Список

Период

Сортировка

От	Hitoshi Harada
Тема	Re: Add ENCODING option to COPY
Дата	25 января 2011 г. 11:24:36
Msg-id	AANLkTi=eAtrf06WLCRTyM=KZsL41R=UoVT4QDECc7G+V@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Add ENCODING option to COPY (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Ответы	Re: Add ENCODING option to COPY
Список	pgsql-hackers

Дерево обсуждения

2011/1/25 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
> On Sat, Jan 15, 2011 at 02:25, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
>> The patch overrides client_encoding by the added ENCODING option, and
>> restores it as soon as copy is done.
>
> We cannot do that because error messages should be encoded in the original
> encoding even during COPY commands with encoding option. Error messages
> could contain non-ASCII characters if lc_messages is set.

Agreed.

>> I see some complaints ask to use
>> pg_do_encoding_conversion() instead of
>> pg_client_to_server/server_to_client(), but the former will surely add
>> slight overhead per reading line
>
> If we want to reduce the overhead, we should cache the conversion procedure
> in CopyState. How about adding something like "FmgrInfo file_to_server_covv"
> into it?

I looked down to the code and found that we cannot pass FmgrInfo * to
any functions defined in pg_wchar.h, since the header file is shared
in libpq, too.

For the record, I also tried pg_do_encoding_conversion() instead of
pg_client_to_server/server_to_client(), and the simple benchmark shows
it is too slow.

with 3000000 lines with 3 columns (~22MB tsv) COPY FROM

*utf8 -> utf8 (no conversion)
13428.233ms
13322.832ms
15661.093ms

*euc_jp -> utf8 (client_encoding)
17527.470ms
16457.452ms
16522.337ms

*euc_jp -> utf8 (pg_do_encoding_conversion)
20550.983ms
21425.313ms
20774.323ms

I'll check the code more if we have better alternatives.

Regards,


-- 
Hitoshi Harada

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Add ENCODING option to COPY