Re: pglz compression performance, take two
От | Andrey Borodin |
---|---|
Тема | Re: pglz compression performance, take two |
Дата | |
Msg-id | 88C049C6-1B86-4EF7-940D-585B07FEB81B@yandex-team.ru обсуждение исходный текст |
Ответ на | Re: pglz compression performance, take two (Mark Dilger <mark.dilger@enterprisedb.com>) |
Ответы |
Re: pglz compression performance, take two
|
Список | pgsql-hackers |
> 20 марта 2021 г., в 00:35, Mark Dilger <mark.dilger@enterprisedb.com> написал(а): > > > >> On Jan 21, 2021, at 6:48 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: >> >> @cfbot: rebased >> <0001-Reorganize-pglz-compression-code.patch> > > Review comments. Thanks for the review, Mark! And sorry for such a long delay, I've been trying to figure out a way to do things less-platform dependent. And here's what I've come up with. We use pglz_read32() not the way xxhash and lz4 does - we really do not need to get 4-byte value, we only need to compare4 bytes at once. So, essentially, we need to compare two implementation of 4-byte comparison bool cpm_a(const void *ptr1, const void *ptr2) { return *(const uint32_t *) ptr1 == *(const uint32_t *) ptr2; } bool cmp_b(const void *ptr1, const void *ptr2) { return memcmp(ptr1, ptr2, 4) == 0; } Variant B is more portable. Inspecting it Godblot's compiler explorer I've found out that for GCC 7.1+ it generates assemblywithout memcmp() call. For x86-64 and ARM64 assembly of cmp_b is identical to cmp_a. So I think maybe we could just stick with version cmp_b instead of optimising for ARM6 and similar architectures like Arduino. I've benchmarked the patch with "REINDEX table pgbench_accounts" on pgbench -i of scale 100. wal_compression was on, othersettings were default. Without patch it takes ~11055.077 ms on my machine, with patch it takes ~9512.411 ms, 14% speedup overall. PFA v5. Thanks! Best regards, Andrey Borodin.
Вложения
В списке pgsql-hackers по дате отправления: