Re: Perform COPY FROM encoding conversions in larger chunks

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Perform COPY FROM encoding conversions in larger chunks
Дата
Msg-id 02da25ef-b579-2236-d3cd-0d07819cce98@iki.fi
обсуждение исходный текст
Ответ на Re: Perform COPY FROM encoding conversions in larger chunks  (John Naylor <john.naylor@enterprisedb.com>)
Ответы Re: Perform COPY FROM encoding conversions in larger chunks  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On 28/01/2021 01:23, John Naylor wrote:
> Hi Heikki,
> 
> 0001 through 0003 are straightforward, and I think they can be committed 
> now if you like.
> 
> 0004 is also pretty straightforward. The check you proposed upthread for 
> pg_upgrade seems like the best solution to make that workable. I'll take 
> a look at 0005 soon.
> 
> I measured the conversions that were rewritten in 0003, and there is 
> indeed a noticeable speedup:
> 
> Big5 to EUC-TW:
> 
> head    196ms
> 0001-3  152ms
> 
> EUC-TW to Big5:
> 
> head    190ms
> 0001-3  144ms
> 
> I've attached the driver function for reference. Example use:
> 
> select drive_conversion(
>    1000, 'euc_tw'::name, 'big5'::name,
>    convert('a few kB of utf8 text here', 'utf8', 'euc_tw')
> );

Thanks! I have committed patches 0001 and 0003 in this series, with 
minor comment fixes. Next I'm going to write the pg_upgrade check for 
patch 0004, to get that into a committable state too.

> I took a look at the test suite also, and the only thing to note is a 
> couple places where the comment doesn't match the code:
> 
> +  -- JIS X 0201: 2-byte encoded chars starting with 0x8e (SS2)
> +  byte1 = hex('0e');
> +  for byte2 in hex('a1')..hex('df') loop
> +    return next b(byte1, byte2);
> +  end loop;
> +
> +  -- JIS X 0212: 3-byte encoded chars, starting with 0x8f (SS3)
> +  byte1 = hex('0f');
> +  for byte2 in hex('a1')..hex('fe') loop
> +    for byte3 in hex('a1')..hex('fe') loop
> +      return next b(byte1, byte2, byte3);
> +    end loop;
> +  end loop;
> 
> Not sure if it matters , but thought I'd mention it anyway.

Good catch! The comments were correct, and the tests were wrong, not 
testing those 2- and 3-byte encoded characters as intened. Doesn't 
matter for testing this patch, I only included those euc_jis_2004 tets 
for the sake of completeness, but if someone finds this test suite in 
the archives and want to use it for something real, make sure you fix 
that first.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: VACUUM (DISABLE_PAGE_SKIPPING on)
Следующее
От: Greg Nancarrow
Дата:
Сообщение: Re: Parallel INSERT (INTO ... SELECT ...)