Re: Optimize partial TOAST decompression
От | Andrey Borodin |
---|---|
Тема | Re: Optimize partial TOAST decompression |
Дата | |
Msg-id | 123EF56B-F8EC-4868-B49B-095795095E7A@yandex-team.ru обсуждение исходный текст |
Ответ на | Re: Optimize partial TOAST decompression (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: Optimize partial TOAST decompression
|
Список | pgsql-hackers |
> 30 сент. 2019 г., в 22:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote: >> >> >>> 30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >>> >>> I mean this: >>> >>> /* >>> * Use int64 to prevent overflow during calculation. >>> */ >>> compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8; >>> >>> I'm not very familiar with pglz internals, but I'm a bit puzzled by >>> this. My first instinct was to compare it to this: >>> >>> #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4) >>> >>> but clearly that's a very different (much simpler) formula. So why >>> shouldn't pglz_maximum_compressed_size simply use this macro? > >> >> compressed_size accounts for possible increase of size during >> compression. pglz can consume up to 1 control byte for each 8 bytes of >> data in worst case. > > OK, but does that actually translate in to the formula? We essentially > need to count 8-byte chunks in raw data, and multiply that by 9. Which > gives us something like > > nchunks = ((rawsize + 7) / 8) * 9; > > which is not quite what the patch does. I'm afraid neither formula is correct, but all this is hair-splitting differences. Your formula does not account for the fact that we may not need all bytes from last chunk. Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 literals, 4 bytes total But nchunks = 9. Binguo's formula is appending 1 control bit per data byte and one extra control byte. Consider size = 8 bytes. We need 1 control byte, 8 literals, 9 total. But compressed_size = 10. Mathematically correct formula is compressed_size = (int32) ((int64) rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7 control bits for overflow. But this equations make no big difference, each formula is safe. I'd pick one which is easier to understand and document(IMO, its nchunks = ((rawsize + 7) / 8) * 9). Thanks! -- Andrey Borodin Open source RDBMS development team leader Yandex.Cloud
В списке pgsql-hackers по дате отправления: