Re: Performance Improvement by reducing WAL for Update Operation

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: Performance Improvement by reducing WAL for Update Operation
Дата	27 октября 2012 г. 19:36:47
Msg-id	508C2E9B.5070201@vmware.com обсуждение исходный текст
Ответ на	Re: Performance Improvement by reducing WAL for Update Operation (Amit Kapila <amit.kapila@huawei.com>)
Ответы	Re: Performance Improvement by reducing WAL for Update Operation
Список	pgsql-hackers

Дерево обсуждения

On 27.10.2012 14:27, Amit Kapila wrote:
> On Saturday, October 27, 2012 4:03 AM Noah Misch wrote:
>> In my previous review, I said:
>>
>>     Given [not relying on the executor to know which columns changed],
>> why not
>>     treat the tuple as an opaque series of bytes and not worry about
>> datum
>>     boundaries?  When several narrow columns change together, say a
>> sequence
>>     of sixteen smallint columns, you will use fewer binary delta
>> commands by
>>     representing the change with a single 32-byte substitution.  If an
>> UPDATE
>>     changes just part of a long datum, the delta encoding algorithm
>> will still
>>     be able to save considerable space.  That case arises in many
>> forms:
>>     changing one word in a long string, changing one element in a long
>> array,
>>     changing one field of a composite-typed column.  Granted, this
>> makes the
>>     choice of delta encoding algorithm more important.
>>
>> We may be leaving considerable savings on the table by assuming that
>> column
>> boundaries are the only modified-range boundaries worth recognizing.
>> What is
>> your willingness to explore general algorithms for choosing such
>> boundaries?
>> Such an investigation may, of course, be a dead end.
>
> For this patch I am interested to go with delta encoding approach based on
> column boundaries.
>
> However I shall try to do it separately and if it gives positive results
> then I will share with hackers.
> I will try with VCDiff once or let me know if you have any other algorithm
> in mind.

One idea is to use the LZ format in the WAL record, but use your 
memcmp() code to construct it. I believe the slow part in LZ compression 
is in trying to locate matches in the "history", so if you just replace 
that with your code that's aware of the column boundaries and uses 
simple memcmp() to detect what parts changed, you could create LZ 
compressed output just as quickly as the custom encoded format. It would 
leave the door open for making the encoding smarter or to do actual 
compression in the future, without changing the format and the code to 
decode it.

- Heikki

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Performance Improvement by reducing WAL for Update Operation