Re: Add LZ4 compression in pg_dump
От | Tomas Vondra |
---|---|
Тема | Re: Add LZ4 compression in pg_dump |
Дата | |
Msg-id | 09b37949-cfe5-29bb-cbb1-498ee5700b61@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Add LZ4 compression in pg_dump (Justin Pryzby <pryzby@telsasoft.com>) |
Ответы |
Re: Add LZ4 compression in pg_dump
|
Список | pgsql-hackers |
On 2/27/23 05:49, Justin Pryzby wrote: > On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote: >> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: >>> I have some fixes (attached) and questions while polishing the patch for >>> zstd compression. The fixes are small and could be integrated with the >>> patch for zstd, but could be applied independently. >> >> One more - WriteDataToArchiveGzip() says: > > One more again. > > The LZ4 path is using non-streaming mode, which compresses each block > without persistent state, giving poor compression for -Fc compared with > -Fp. If the data is highly compressible, the difference can be orders > of magnitude. > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c > 12351763 > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c > 21890708 > > That's not true for gzip: > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c > 2118869 > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c > 2115832 > > The function ought to at least use streaming mode, so each block/row > isn't compressioned in isolation. 003 is a simple patch to use > streaming mode, which improves the -Fc case: > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c > 15178283 > > However, that still flushes the compression buffer, writing a block > header, for every row. With a single-column table, pg_dump -Fc -Z lz4 > still outputs ~10% *more* data than with no compression at all. And > that's for compressible data. > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c > 12890296 > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c > 11890296 > > I think this should use the LZ4F API with frames, which are buffered to > avoid outputting a header for every single row. The LZ4F format isn't > compatible with the LZ4 format, so (unlike changing to the streaming > API) that's not something we can change in a bugfix release. I consider > this an Opened Item. > > With the LZ4F API in 004, -Fp and -Fc are essentially the same size > (like gzip). (Oh, and the output is three times smaller, too.) > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c > 4155448 > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c > 4156548 > Thanks. Those are definitely interesting improvements/optimizations! I suggest we track them as a separate patch series - please add them to the CF app (I guess you'll have to add them to 2023-07 at this point, but we can get them in, I think). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: