Обсуждение: wal_compression=zstd

Поиск
Список
Период
Сортировка

wal_compression=zstd

От
Justin Pryzby
Дата:
Since 4035cd5d4, wal_compression=lz4 is supported.  I think pg15 should also
support wal_compression=zstd.  There are free bits in the WAL header.

The last message on that thread includes a patch doing just that, which I've
rebased.
https://www.postgresql.org/message-id/YNqWd2GSMrnqWIfx@paquier.xyz

It might be nice if to also support wal_compression=zstd-level, but that seems
optional and could wait for pg16...

In [0], I showed a case where lz4 is just as fast as uncompressed data, and
writes ~60% as much.  zstd writes half as much as LZ4 and 12% slower.
[0] https://www.postgresql.org/message-id/20210614012412.GA31772%40telsasoft.com

I am not claiming that zstd is generally better for WAL.  Rather, if we're
going to support alternate compression methods, it's nice to give a couple
options (in addition to pglz).  Some use cases would certainly suffer from
slower WAL.  But providing the option will benefit some others.  Note that a
superuser can set wal_compression for a given session - this would probably
benefit bulk-loading in an otherwise OLTP environment.

As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
level is 6).

0001 adds support for zstd
0002 makes more efficient our use of bits in the WAL header
0003 makes it the default compression, to exercise on CI (windows will fail).
0004 adds support for zstd-level
0005-6 are needed to allow recovery tests to pass when wal compression is enabled;
0007 (not included) also adds support for zlib.  I'm of the impression nobody
     cares about this, otherwise it would've been included 5-10 years ago.

Only 0001 should be reviewed for pg15 - the others are optional/future work.

Вложения

Re: wal_compression=zstd

От
Michael Paquier
Дата:
On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:
> I am not claiming that zstd is generally better for WAL.  Rather, if we're
> going to support alternate compression methods, it's nice to give a couple
> options (in addition to pglz).  Some use cases would certainly suffer from
> slower WAL.  But providing the option will benefit some others.  Note that a
> superuser can set wal_compression for a given session - this would probably
> benefit bulk-loading in an otherwise OLTP environment.

Well, I cannot say which one is better either as it depends on your
workload, mostly, but we know for sure that both have benefits, so I
don't mind adding it now.  What you are proposing is built on top of
the existing code, making it a very simple change.

> As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
> level is 6).

Why?  ZSTD using this default has its reasons, no?  And it would be
consistent to do the same for ZSTD as for the other two methods.

-        The supported methods are <literal>pglz</literal> and
-        <literal>lz4</literal> (if <productname>PostgreSQL</productname> was
-        compiled with <option>--with-lz4</option>). The default value is
-        <literal>off</literal>. Only superusers can change this setting.
+        The supported methods are <literal>pglz</literal>, and
+        (if configured when <productname>PostgreSQL</productname>was built)
+        <literal>lz4</literal> and zstd.
+        The default value is <literal>off</literal>.
+        Only superusers can change this setting.

This is making the docs less verbose.  I think that you should keep
the mention to respectively --with-lz4 and --with-zstd after each
value.

       <para>
        Build with <productname>ZSTD</productname> compression support.
+       This enables use of <productname>ZSTD</productname> for
+       compression of WAL data.

This addition is not necessary, see d7a9786.

Not related to your patch, but we have more reasons to add an
check in the block of BKPIMAGE_COMPRESSED() in RestoreBlockImage() of
xlogreader.c to make sure that only one bit is set for the compression
type.  We could use a pg_popcount() == 1 for that, returning
report_invalid_record() when we see some corrupted data.

As a whole, 0001 looks pretty good to me after a detailed read (not
tested, though).

> Only 0001 should be reviewed for pg15 - the others are optional/future work.

That's wiser for v15.  FWIW, I have the impression that we don't need
most of what's proposed in 0002~.

 /* compression methods supported */
-#define BKPIMAGE_COMPRESS_PGLZ 0x04
-#define BKPIMAGE_COMPRESS_LZ4  0x08
-#define BKPIMAGE_COMPRESS_ZSTD 0x10
-
+#define BKPIMAGE_COMPRESS_METHOD1      0x04    /* bits to encode  compression method */
+#define BKPIMAGE_COMPRESS_METHOD2      0x08    /* 0=none, 1=pglz, 2=lz4, 3=zstd */

As of 0002.  We still have some room for this data and this makes the
code harder to follow, so I'd live this part of the code as it is
proposed in 0001.

0003, defaulting to zstd, and 0004 to extend compression to support a
level would require a lot of benchmarking to be justified.  I have
argued against making the code more complicated for such things in the
past, with reloptions.  The footprint on the code is much smaller,
here, still..

0007, for ZLIB, does not make sense once one can choose between LZ4
and ZSTD.
--
Michael

Вложения

Re: wal_compression=zstd

От
Justin Pryzby
Дата:
On Fri, Mar 04, 2022 at 04:19:32PM +0900, Michael Paquier wrote:
> On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:
> 
> > As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
> > level is 6).
> 
> Why?  ZSTD using this default has its reasons, no?  And it would be
> consistent to do the same for ZSTD as for the other two methods.

In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
cost.

postgres=# SET wal_compression= 'zstd-6';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal');
begin;CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
 
Duración: 2729,017 ms (00:02,729)
wal_bytes        | 6102403

postgres=# SET wal_compression= 'zstd-1';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal');
begin;CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
 
Duración: 2090,459 ms (00:02,090)
wal_bytes        | 6330269

It may be relevant that we're only compressing 8k [0].
It would probably pay off to preserve a compression "dictionary" to be re-used
across FPIs.

> -        The supported methods are <literal>pglz</literal> and
> -        <literal>lz4</literal> (if <productname>PostgreSQL</productname> was
> -        compiled with <option>--with-lz4</option>). The default value is
> -        <literal>off</literal>. Only superusers can change this setting.
> +        The supported methods are <literal>pglz</literal>, and
> +        (if configured when <productname>PostgreSQL</productname>was built)
> +        <literal>lz4</literal> and zstd.
> +        The default value is <literal>off</literal>.
> +        Only superusers can change this setting.
> 
> This is making the docs less verbose.  I think that you should keep
> the mention to respectively --with-lz4 and --with-zstd after each
> value.

I changed this relative to your latest zstd patch in July since it reads better
to me.  YMMV.

On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:
>> 0001 adds support for zstd
>> 0002 makes more efficient our use of bits in the WAL header
>> 0003 makes it the default compression, to exercise on CI (windows will fail).
>> 0004 adds support for zstd-level
>> 0005-6 are needed to allow recovery tests to pass when wal compression is enabled;
>> 0007 (not included) also adds support for zlib.  I'm of the impression nobody
>>      cares about this, otherwise it would've been included 5-10 years ago.

> 0003, defaulting to zstd, would require a lot of benchmarking to be
> justified.  

Maybe you didn't see that I wrote that it was for CI ?
(In any case, it's impossible to do that without first taking care of 005-6).

> and 0004 to extend compression to support a level 
> I have argued against making the code more complicated for such things in the
> past, with reloptions.

I suppose that you're referring to reloptions for toast compression, which was
about controlling LZ4 compression level.

https://www.postgresql.org/message-id/flat/CAFiTN-vMHU_yakMgNo90SUih_6LtvmqZ5hpQb2iTgQtVyOjyFA@mail.gmail.com

I think you're right that it's not interesting to control the compression level
of LZ4 - but there's no reason to say the same thing of zstd.  We're already
debating which is the "right" level to use (1 or 6).  I don't think there is a
"right" level - some people will want to trade more CPU for disk writes and
some people won't.



Re: wal_compression=zstd

От
Robert Haas
Дата:
On Fri, Mar 4, 2022 at 6:44 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > Why?  ZSTD using this default has its reasons, no?  And it would be
> > consistent to do the same for ZSTD as for the other two methods.
>
> In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
> cost.

I agree with Michael. Your 1-off test is exactly that, and the results
will have depended on the data you used for the test. I'm not saying
we could never decide to default to a compression level other than the
library's default, but I do not think we should do it casually or as
the result of any small number of tests. There should be a strong
presumption that the authors of the library have a good idea what is
sensible in general unless we can explain compellingly why our use
case is different from typical ones.

There's an ease-of-use concern here too. It's not going to make things
any easier for users to grok if zstd is available in different parts
of the system but has different defaults in each place. It wouldn't be
the end of the world if that happened, but neither would it be ideal.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: wal_compression=zstd

От
Andrew Dunstan
Дата:
On 2/22/22 18:19, Justin Pryzby wrote:
> As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
> level is 6).


I think this choice needs to be supported by some benchmarks.


>
> 0001 adds support for zstd
> 0002 makes more efficient our use of bits in the WAL header
> 0003 makes it the default compression, to exercise on CI (windows will fail).
> 0004 adds support for zstd-level
> 0005-6 are needed to allow recovery tests to pass when wal compression is enabled;
> 0007 (not included) also adds support for zlib.  I'm of the impression nobody
>      cares about this, otherwise it would've been included 5-10 years ago.
>
> Only 0001 should be reviewed for pg15 - the others are optional/future work.



I don't see why patch 5 shouldn't be applied forthwith.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: wal_compression=zstd

От
Michael Paquier
Дата:
On Fri, Mar 04, 2022 at 08:08:03AM -0500, Robert Haas wrote:
> On Fri, Mar 4, 2022 at 6:44 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>> In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
>> cost.

Hmm, it may be good to start afresh and compile numbers in a single
chart.  I did that here with some numbers on the user and system CPU:
https://www.postgresql.org/message-id/YMmlvyVyAFlxZ+/H@paquier.xyz

For this test, regarding ZSTD, the lowest level did not have much
difference with the default level, and at the highest level the user
CPU spiked for little gain in compression.  All of them compressed
more than LZ4, with more CPU used in each case, but the default or a
level value lower than the default gives me the impression that it
won't matter much in terms of compression gains and CPU usage.

> I agree with Michael. Your 1-off test is exactly that, and the results
> will have depended on the data you used for the test. I'm not saying
> we could never decide to default to a compression level other than the
> library's default, but I do not think we should do it casually or as
> the result of any small number of tests. There should be a strong
> presumption that the authors of the library have a good idea what is
> sensible in general unless we can explain compellingly why our use
> case is different from typical ones.
>
> There's an ease-of-use concern here too. It's not going to make things
> any easier for users to grok if zstd is available in different parts
> of the system but has different defaults in each place. It wouldn't be
> the end of the world if that happened, but neither would it be ideal.

I'd like to believe that anybody who writes his/her own compression
algorithm have a good idea of the default behavior they want to show,
so we could remain simple, and believe in them.  Now, I would not
object to see some fresh numbers, and assuming that all FPIs have the
same page size, we could go down to designing a couple of test cases
that produce a fixed number of FPIs and measure the compressability in
a single session.

Repeatability and randomness of data counts, we could have for example
one case with a set of 5~7 int attributes, a second with text values
that include random data, up to 10~12 bytes each to count on the tuple
header to be able to compress some data, and a third with more
repeatable data, like one attribute with an int column populate
with generate_series().  Just to give an idea.
--
Michael

Вложения

Re: wal_compression=zstd

От
Michael Paquier
Дата:
On Fri, Mar 04, 2022 at 08:10:35AM -0500, Andrew Dunstan wrote:
> I don't see why patch 5 shouldn't be applied forthwith.

Only applying 0005 would result in a failure in the TAP test for a
problem whose fix is attempted in 0006.  This is an issue unrelated to
this thread.

FWIW, I am a bit disturbed by EnsureTopTransactionIdLogged() and its
design in 0006, where we'd finish by using a XLogFlush() call within
two SQL functions, but I have not really looked at the problem to see
if it is a viable solution or not.
--
Michael

Вложения

Re: wal_compression=zstd

От
Michael Paquier
Дата:
On Sat, Mar 05, 2022 at 07:26:39PM +0900, Michael Paquier wrote:
> Repeatability and randomness of data counts, we could have for example
> one case with a set of 5~7 int attributes, a second with text values
> that include random data, up to 10~12 bytes each to count on the tuple
> header to be able to compress some data, and a third with more
> repeatable data, like one attribute with an int column populate
> with generate_series().  Just to give an idea.

And that's what I did as of the attached set of test:
- Cluster on tmpfs.
- max_wal_size, min_wal_size at 2GB and shared_buffers at 1GB, aka
large enough to include the full data set in memory.
- Rather than using Justin's full patch set, I have just patched the
code in xloginsert.c to switch the level.
- One case with table that uses one int attribute, with rather
repetitive data worth 484MB.
- One case with table using (int, text), where the text data is made
of 10~11 bytes of random data, worth ~340MB.
- Use pg_prewarm to load the data into shared buffers.  With the
cluster mounted on a tmpfs that should not matter though.
- Both tables have a fillfactor at 50 to give room to the updates.

I have measured the CPU usage with a toy extension, also attached,
called pg_rusage() that is a simple wrapper to upstream's pg_rusage.c
to initialize a rusage snapshot and print its data with two SQL
functions that are called just before and after the FPIs are generated
(aka the UPDATE query that rewrites the whole table in the script
attached).

The quickly-hacked test script and the results are in test.tar.gz, for
reference.  The toy extension is pg_rusage.tar.gz.

Here are the results I compiled, as of results_format.sql in the
tarball attached:
             descr             | rel_size | fpi_size | time_s
-------------------------------+----------+----------+--------
 int column no compression     | 429 MB   | 727 MB   |  13.15
 int column ztsd default level | 429 MB   | 523 MB   |  14.23
 int column zstd level 1       | 429 MB   | 524 MB   |  13.94
 int column zstd level 10      | 429 MB   | 523 MB   |  23.46
 int column zstd level 19      | 429 MB   | 523 MB   | 103.71
 int column lz4 default level  | 429 MB   | 575 MB   |  13.37
 int/text no compression       | 344 MB   | 558 MB   |  10.08
 int/text lz4 default level    | 344 MB   | 463 MB   |  10.29
 int/text zstd default level   | 344 MB   | 415 MB   |  11.48
 int/text zstd level 1         | 344 MB   | 418 MB   |  11.25
 int/text zstd level 10        | 344 MB   | 415 MB   |  20.59
 int/text zstd level 19        | 344 MB   | 413 MB   |  62.64
(12 rows)

I did not expect zstd to be this slow at a level of 10~ actually.  The
runtime (elapsed CPU time) got severely impacted at level 19, that I
ran just for fun to see how that it would become compared to a level
of 10.  There is a slight difference between the default level and a
level of 1, and the compression size does not change much, nor does
the CPU usage really change.

Attached is an updated patch, while on it, that I have tweaked before
running my own tests.

At the end, I'd still like to think that we'd better stick with the
default level for this parameter, and that's the suggestion of
upstream.  So I would like to move on with that for this patch.
--
Michael

Вложения

Re: wal_compression=zstd

От
Justin Pryzby
Дата:
On Fri, Mar 04, 2022 at 05:44:06AM -0600, Justin Pryzby wrote:
> On Fri, Mar 04, 2022 at 04:19:32PM +0900, Michael Paquier wrote:
> > On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:
> > 
> > > As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
> > > level is 6).
> > 
> > Why?  ZSTD using this default has its reasons, no?  And it would be
> > consistent to do the same for ZSTD as for the other two methods.
> 
> In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
> cost.

Actually, my test used zstd-6, rather than the correct default of 3.

The comparison should have been:

postgres=# SET wal_compression='zstd-1';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal');
begin;CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
 
Time: 2074.046 ms (00:02.074)
        2763 |    2758 |   6343591 |                0 |         5 |        5 |              0 |             0 |
2022-03-0505:04:08.599867-06
 


vs

postgres=# SET wal_compression='zstd-3';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal');
begin;CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
 
Time: 2471.552 ms (00:02.472)
 wal_records | wal_fpi | wal_bytes | wal_buffers_full | wal_write | wal_sync | wal_write_time | wal_sync_time |
stats_reset
 

-------------+---------+-----------+------------------+-----------+----------+----------------+---------------+-------------------------------
        2762 |    2746 |   6396890 |              274 |       274 |        0 |              0 |             0 |
2022-03-0505:04:31.283432-06
 

=> zstd-1 actually wrote less than zstd-3 (which is odd) but by an
insignificant amount.  It's no surprise that zstd-1 is faster than zstd-3, but
(of course) by a smaller amount than zstd-6.

Anyway there's no compelling reason to not use the default.  If we were to use
a non-default default, we'd have to choose between 1 and 2 (or some negative
compression level).  My thinking was that zstd-1 would give the lowest-hanging
fruits for zstd, while minimizing performance tradeoff, since WAL affects
interactivity.  But choosing between 1 and 2 seems like bikeshedding.



Re: wal_compression=zstd

От
Michael Paquier
Дата:
On Wed, Mar 09, 2022 at 07:14:11AM -0600, Justin Pryzby wrote:
> Anyway there's no compelling reason to not use the default.  If we were to use
> a non-default default, we'd have to choose between 1 and 2 (or some negative
> compression level).  My thinking was that zstd-1 would give the lowest-hanging
> fruits for zstd, while minimizing performance tradeoff, since WAL affects
> interactivity.  But choosing between 1 and 2 seems like bikeshedding.

Yeah, I have looked again at the patch today, and I saw no reason to
not apply it to give more options to the user as zstd or lz4 are both
good in their own ways.  So done, with the default level used.
--
Michael

Вложения

Re: wal_compression=zstd

От
Justin Pryzby
Дата:
On Fri, Mar 11, 2022 at 12:23:59PM +0900, Michael Paquier wrote:
> On Wed, Mar 09, 2022 at 07:14:11AM -0600, Justin Pryzby wrote:
> > Anyway there's no compelling reason to not use the default.  If we were to use
> > a non-default default, we'd have to choose between 1 and 2 (or some negative
> > compression level).  My thinking was that zstd-1 would give the lowest-hanging
> > fruits for zstd, while minimizing performance tradeoff, since WAL affects
> > interactivity.  But choosing between 1 and 2 seems like bikeshedding.
> 
> Yeah, I have looked again at the patch today, and I saw no reason to
> not apply it to give more options to the user as zstd or lz4 are both
> good in their own ways.  So done, with the default level used.

It's great news - thanks.

-- 
Justin



Re: wal_compression=zstd

От
Justin Pryzby
Дата:
While rebasing, I realized this should have bumped XLOG_PAGE_MAGIC.

Also, there's a dangling "and".

+        The supported methods are <literal>pglz</literal>,
+        <literal>lz4</literal> (if <productname>PostgreSQL</productname>
+        was compiled with <option>--with-lz4</option>) and
+        <literal>zstd</literal> (if <productname>PostgreSQL</productname>
+        was compiled with <option>--with-zstd</option>) and
+        The default value is <literal>off</literal>.
+        Only superusers can change this setting.

-- 
Justin



Re: wal_compression=zstd

От
Michael Paquier
Дата:
On Fri, Mar 11, 2022 at 03:49:00PM -0600, Justin Pryzby wrote:
> While rebasing, I realized this should have bumped XLOG_PAGE_MAGIC.
>
> Also, there's a dangling "and".

Right.  I'll address that a bit later today.  Thanks!
--
Michael

Вложения