Re: Add LZ4 compression in pg_dump
От | Justin Pryzby |
---|---|
Тема | Re: Add LZ4 compression in pg_dump |
Дата | |
Msg-id | 20230227044910.GO1653@telsasoft.com обсуждение исходный текст |
Ответ на | Re: Add LZ4 compression in pg_dump (Justin Pryzby <pryzby@telsasoft.com>) |
Ответы |
Re: Add LZ4 compression in pg_dump
|
Список | pgsql-hackers |
On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote: > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: > > I have some fixes (attached) and questions while polishing the patch for > > zstd compression. The fixes are small and could be integrated with the > > patch for zstd, but could be applied independently. > > One more - WriteDataToArchiveGzip() says: One more again. The LZ4 path is using non-streaming mode, which compresses each block without persistent state, giving poor compression for -Fc compared with -Fp. If the data is highly compressible, the difference can be orders of magnitude. $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c 12351763 $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c 21890708 That's not true for gzip: $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c 2118869 $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c 2115832 The function ought to at least use streaming mode, so each block/row isn't compressioned in isolation. 003 is a simple patch to use streaming mode, which improves the -Fc case: $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c 15178283 However, that still flushes the compression buffer, writing a block header, for every row. With a single-column table, pg_dump -Fc -Z lz4 still outputs ~10% *more* data than with no compression at all. And that's for compressible data. $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c 12890296 $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c 11890296 I think this should use the LZ4F API with frames, which are buffered to avoid outputting a header for every single row. The LZ4F format isn't compatible with the LZ4 format, so (unlike changing to the streaming API) that's not something we can change in a bugfix release. I consider this an Opened Item. With the LZ4F API in 004, -Fp and -Fc are essentially the same size (like gzip). (Oh, and the output is three times smaller, too.) $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c 4155448 $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c 4156548 -- Justin
Вложения
В списке pgsql-hackers по дате отправления: