Re: pglz performance

Поиск

Список

Период

Сортировка

От	Tels
Тема	Re: pglz performance
Дата	3 ноября 2019 г. 09:24:43
Msg-id	d56c85b989a3bd8c0a98d79553276b0e@bloodgate.com обсуждение исходный текст
Ответ на	Re: pglz performance (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы	Re: pglz performance
Список	pgsql-hackers

Дерево обсуждения

Hello Andrey,

On 2019-11-02 12:30, Andrey Borodin wrote:
>> 1 нояб. 2019 г., в 18:48, Alvaro Herrera <alvherre@2ndquadrant.com> 
>> написал(а):
> PFA two patches:
> v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in
> test_pglz extension)
> v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known
> as 'hacked8')

Looking at the patches, it seems only the case of a match is changed. 
But when we observe a literal byte, this is copied byte-by-byte with:

  else
   {
   * An unset control bit means LITERAL BYTE. So we just
   * copy one from INPUT to OUTPUT.
   */
   *dp++ = *sp++;
   }

Maybe we can optimize this, too. For instance, you could just increase a 
counter:

  else
   {
   /*
   * An unset control bit means LITERAL BYTE. We count
   * these and copy them later.
   */
   literal_bytes ++;
   }

and in the case of:

   if (ctrl & 1)
     {
     /* First copy all the literal bytes */
     if (literal_bytes > 0)
       {
       memcpy( sp, dp, literal_bytes);
       sp += literal_bytes;
       dp += literal_bytes;
       literal_bytes = 0;
       }

(Code untested!)

The same would need to be done at the very end, if the input ends 
without any new CTRL-byte.

Wether that gains us anything depends on how common literal bytes are. 
It might be that highly compressible input has almost none, while input 
that is a mix of incompressible strings and compressible ones might have 
longer stretches. One example would be something like an SHA-256, that 
is repeated twice. The first instance would be incompressible, the 
second one would be just a copy. This might not happens that often in 
practical inputs, though.

I wonder if you agree and what would happen if you try this variant on 
your corpus tests.

Best regards,

Tels

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: pglz performance