Обсуждение: XLog size reductions: smaller XLRec block header for PG17

Поиск
Список
Период
Сортировка

XLog size reductions: smaller XLRec block header for PG17

От
Matthias van de Meent
Дата:
Hi all,

As was previously discussed at the thread surrounding [1]: Currently
any block registration in WAL takes up at least 8 bytes in xlog
overhead, regardless of how much data is included in that block: 1
byte for block ID, 1 byte for fork and flags, 2 bytes for the block
data length, and 4 bytes for the blockNo. (Usually, another 12 bytes
are used for a RelFileLocator; but that may not be included for some
blocks in the record when conditions apply)

Attached is a patch that reduces this overhead by up to 2 bytes by
encoding how large the block data length field is into the block ID,
and thus optionally reducing the block data's length field to 0 bytes.
Examples: cross-page update records will now be 2 bytes shorter,
because the record never registers any data for the new block of the
update; pgbench transactions are now either 6 or 8 bytes smaller
depending on whether the update crosses a page boundary (in xlog
record size; after alignment it is 0 or 4/8 bytes, depending on
MAXALIGN and whether the updates are cross-page updates).

It changes the block IDs used to fit in 6 bits, using the upper 2 bits
of the block_id field to store how much data is contained in the
record (0, <=UINT8_MAX, or <=UINT16_MAX bytes).

This is part 1 of a series of patches trying to decrease the size of
WAL; see also [0], [1] and [2] for more info on what's still to go.
I'm working on a separate, much more involved patch for the XLogRecord
header itself, similar to the patch in [1], which I expect to send
sometime soon as well.
Unless someone thinks the patches should be discussed as one series,
I'm planning on posting that in another thread, as I don't see any
meaningful dependencies between the patches, and the XLR header patch
will be quite a bit larger than this one.

Kind regards,

Matthias van de Meent
Neon, Inc.

[0] https://wiki.postgresql.org/wiki/Updating_the_WAL_infrastructure
[1]
https://www.postgresql.org/message-id/flat/CAEze2Wjd3jY_UhhOGdGGnC6NO%3D%2BNmtNOmd%3DJaYv-v-nwBAiXXA%40mail.gmail.com#17a51d83923f4390d8f407d0d6c5da07
[2]
https://www.postgresql.org/message-id/flat/CAEze2Whf%3DfwAj7rosf6aDM9t%2B7MU1w-bJn28HFWYGkz%2Bics-hg%40mail.gmail.com

PS. Benchmark results on my system (5950x with other light tasks
running) don't show an obviously negative effect in a 10-minute run
with these arbitrary pgbench settings on a fresh cluster with default
configuration:

./pg_install/bin/pgbench postgres -j 2 -c 6 -T 600 -M prepared
[...]
master: tps = 375
patched: tps = 381

Вложения

Re: XLog size reductions: smaller XLRec block header for PG17

От
Heikki Linnakangas
Дата:
On 18/05/2023 17:59, Matthias van de Meent wrote:
> Attached is a patch that reduces this overhead by up to 2 bytes by
> encoding how large the block data length field is into the block ID,
> and thus optionally reducing the block data's length field to 0 bytes.
> Examples: cross-page update records will now be 2 bytes shorter,
> because the record never registers any data for the new block of the
> update; pgbench transactions are now either 6 or 8 bytes smaller
> depending on whether the update crosses a page boundary (in xlog
> record size; after alignment it is 0 or 4/8 bytes, depending on
> MAXALIGN and whether the updates are cross-page updates).
> 
> It changes the block IDs used to fit in 6 bits, using the upper 2 bits
> of the block_id field to store how much data is contained in the
> record (0, <=UINT8_MAX, or <=UINT16_MAX bytes).

Perhaps we should introduce a few generic inline functions to do varint 
encoding. That could be useful in many places, while this scheme is very 
tailored for XLogRecordBlockHeader.

We could replace XLogRecordDataHeaderShort and XLogRecordDataHeaderLong 
with this too. With just one XLogRecordDataHeader, with a 
variable-length length field.

> This is part 1 of a series of patches trying to decrease the size of
> WAL; see also [0], [1] and [2] for more info on what's still to go.
> I'm working on a separate, much more involved patch for the XLogRecord
> header itself, similar to the patch in [1], which I expect to send
> sometime soon as well.
> Unless someone thinks the patches should be discussed as one series,
> I'm planning on posting that in another thread, as I don't see any
> meaningful dependencies between the patches, and the XLR header patch
> will be quite a bit larger than this one.
> 
> Kind regards,
> 
> Matthias van de Meent
> Neon, Inc.
> 
> [0] https://wiki.postgresql.org/wiki/Updating_the_WAL_infrastructure

Good ideas here. Eliminating the two padding bytes from XLogRecord in 
particular seems like a pure win.

> PS. Benchmark results on my system (5950x with other light tasks
> running) don't show an obviously negative effect in a 10-minute run
> with these arbitrary pgbench settings on a fresh cluster with default
> configuration:
> 
> ./pg_install/bin/pgbench postgres -j 2 -c 6 -T 600 -M prepared
> [...]
> master: tps = 375
> patched: tps = 381

That was probably not CPU limited, so that any overhead in generating 
the WAL would not show up. Try PGOPTIONS="-csynchronous_commit=off" and 
pgbench -N option. And make sure the scale is large enough that there is 
no lock contention. Also would be good to measure the overhead in 
replaying the WAL.

How much space saving does this yield?

- Heikki




Re: XLog size reductions: smaller XLRec block header for PG17

От
Matthias van de Meent
Дата:
On Thu, 18 May 2023 at 18:22, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 18/05/2023 17:59, Matthias van de Meent wrote:
> Perhaps we should introduce a few generic inline functions to do varint
> encoding. That could be useful in many places, while this scheme is very
> tailored for XLogRecordBlockHeader.

I'm not sure about the reusability of such code, as not all varint
encodings are made equal:

Here, I chose to determine the size of the field with some bits stored
in leftover bits of another field, so storing the field and size
separately.
But in other cases, such as UTF8's code point encoding, each byte has
a carry bit indicating whether the value has more bytes to go.
In even more other cases, such as sqlite's integer encoding, the value
is stored in a single byte, unless that byte contains a sentinel value
that indicates the number of bytes that the value continues into.

What I'm trying to say is that there is no perfect encoding that is
better than all others, and I picked what I thought worked best in
this specific case. I think it is reasonable to expect that
varint-encoding of e.g. blockNo or RelFileNode into the WAL record
could want to choose a different method than the method I've chosen
for the block data length.

> We could replace XLogRecordDataHeaderShort and XLogRecordDataHeaderLong
> with this too. With just one XLogRecordDataHeader, with a
> variable-length length field.

Yes, that could be used too. But that's not part of the patch right
now, and I've not yet planned on implementing that for this patch.

> > [0] https://wiki.postgresql.org/wiki/Updating_the_WAL_infrastructure
>
> Good ideas here. Eliminating the two padding bytes from XLogRecord in
> particular seems like a pure win.

It requires code churn and probably increases complexity, but apart
from that I think 'pure win' is accurate, yes.

> > PS. Benchmark results on my system (5950x with other light tasks
> > running) don't show an obviously negative effect in a 10-minute run
> > with these arbitrary pgbench settings on a fresh cluster with default
> > configuration:
> >
> > ./pg_install/bin/pgbench postgres -j 2 -c 6 -T 600 -M prepared
> > [...]
> > master: tps = 375
> > patched: tps = 381
>
> That was probably not CPU limited, so that any overhead in generating
> the WAL would not show up. Try PGOPTIONS="-csynchronous_commit=off" and
> pgbench -N option. And make sure the scale is large enough that there is
> no lock contention. Also would be good to measure the overhead in
> replaying the WAL.

with assertions now disabled, and the following configuration:

synchronous_commit = off
fsync = off
full_page_writes = off
checkpoint_timeout = 1d
autovacuum = off

and now without assertions, I get
master: tps = 3500.815859
patched: tps = 3535.188054

With autovacuum enabled it's worked similarly well, within 1% of these results.

> How much space saving does this yield?

No meaningful savings in the pgbench workload, mostly due to xlog
record length MAXALIGNs currently not being favorable in the pgbench
workload. But, record sizes have dropped by 1 or 2 bytes in several
cases, as can be seen at the bottom of this mail.

Kind regards,

Matthias van de Meent
Neon, Inc.

The data: Record type, then record length averages (average aligned
length between parens) for both master and patched, and the average
per-record savings with this patch.

| record type   | master avg      | patched avg     | delta | delta   |
|               | (aligned avg)   | (aligned avg)   |       | aligned |
|---------------|-----------------|-----------------|-------|---------|
| BT/DEDUP      | 64.00 (64.00)   | 63.00 (64.00)   |    -1 |       0 |
| BT/INS_LEAF   | 81.41 (81.41)   | 80.41 (81.41)   |    -1 |       0 |
| CLOG/0PG      | 30.00 (32.00)   | 30.00 (32.00)   |     0 |       0 |
| HEAP/DEL      | 54.00 (56.00)   | 52.00 (56.00)   |    -2 |       0 |
| HEAP/HOT_UPD  | 72.02 (72.19)   | 71.02 (72.19)   |     0 |       0 |
| HEAP/INS      | 79.00 (80.00)   | 78.00 (80.00)   |    -1 |       0 |
| HEAP/INS+INIT | 79.00 (80.00)   | 78.00 (80.00)   |    -1 |       0 |
| HEAP/LOCK     | 54.00 (56.00)   | 52.00 (56.00) * |    -2 |       0 |
| HEAP2/MUL_INS | 85.00 (88.00)   | 84.00 (88.00) * |    -1 |       0 |
| HEAP2/PRUNE   | 65.17 (68.19)   | 64.17 (68.19)   |    -1 |       0 |
| STDBY/R_XACTS | 52.76 (56.00)   | 52.21 (56.00)   |  -0.5 |       0 |
| TX/COMMIT     | 34.00 (40.00)   | 34.00 (40.00)   |     0 |       0 |
| XLOG/CHCKPT_O | 114.00 (120.00) | 114.00 (120.00) |     0 |       0 |



Re: XLog size reductions: smaller XLRec block header for PG17

От
Aleksander Alekseev
Дата:
Hi,

I noticed that the patch needs review and decided to take a look.

> No meaningful savings in the pgbench workload, mostly due to xlog
> record length MAXALIGNs currently not being favorable in the pgbench
> workload. But, record sizes have dropped by 1 or 2 bytes in several
> cases, as can be seen at the bottom of this mail.

This may not sound a lot but still is valuable IMO if we consider the
reduction in terms of percentages of overall saved disk throughput,
network traffic, etc, not in absolute values per one record. Even if
1-2 bytes are not a bottleneck that can be seen on benchmarks (or the
performance improvement is not that impressive), it's some amount of
money paid on cloud. Considering the fact that the patch is not that
complicated I see no reason not to apply the optimization as long as
it doesn't cause degradations.

I also agree with Matthias' arguments above regarding the lack of
one-size-fits-all variable encoding and the overall desire to keep the
focus. E.g. the code can be refactored if and when we discover that
different subsystems ended up using the same encoding.

All in all the patch looks good to me, but I have a couple of nitpicks:

* The comment for XLogSizeClass seems to be somewhat truncated as if
Ctr+S was not pressed before creating the patch. I also suggest
double-checking the grammar.
* `Size written = -1;` in XLogWriteLength() can lead to compiler
warnings some day considering the fact that Size / size_t are
unsigned. Also this assignment doesn't seem to serve any particular
purpose. So I suggest removing it.
* I don't see much value in using the WRITE_OP macro in
XLogWriteLength(). The code is read more often than it's written and I
wouldn't call this code particularly readable (although it's shorter).
* XLogReadLength() - ditto
* `if (read < 0)` in DecodeXLogRecord() is noop since `read` is unsigned

-- 
Best regards,
Aleksander Alekseev



Re: XLog size reductions: smaller XLRec block header for PG17

От
Matthias van de Meent
Дата:
On Tue, 5 Sept 2023 at 15:04, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> I noticed that the patch needs review and decided to take a look.

Thanks for reviewing!

> All in all the patch looks good to me, but I have a couple of nitpicks:
>
> * The comment for XLogSizeClass seems to be somewhat truncated as if
> Ctr+S was not pressed before creating the patch. I also suggest
> double-checking the grammar.

I've updated the various comments with improved wording.

> * `Size written = -1;` in XLogWriteLength() can lead to compiler
> warnings some day considering the fact that Size / size_t are
> unsigned. Also this assignment doesn't seem to serve any particular
> purpose. So I suggest removing it.

Fixed, it now uses `int` instead, as does XLogReadLength().

> * I don't see much value in using the WRITE_OP macro in
> XLogWriteLength(). The code is read more often than it's written and I
> wouldn't call this code particularly readable (although it's shorter).
> * XLogReadLength() - ditto

I use READ_OP and WRITE_OP mostly to make sure that each operation's
code is clear. Manually expanding the macro would allow the handling
of each variant to have different structure code, and that would allow
for more coding errors. I think it's extra important to make sure the
code isn't wrong because this concerns WAL (de)serialization, and one
copy is (in my opinion) easier to check for errors than 3 copies.

I've had my share of issues in copy-edited code, so I rather like keep
the template around as long as I don't need to modify the underlying
code.

> * `if (read < 0)` in DecodeXLogRecord() is noop since `read` is unsigned

Yes, thanks for noticing. I've been working with Rust recently, where
unsigned size is `usize` and `size` is signed. The issue has been
fixed in the attached patch with 'int' types instead.

Kind regards,

Matthias van de Meent

Вложения

Re: XLog size reductions: smaller XLRec block header for PG17

От
Andres Freund
Дата:
Hi,

On 2023-05-18 19:22:26 +0300, Heikki Linnakangas wrote:
> On 18/05/2023 17:59, Matthias van de Meent wrote:
> > Attached is a patch that reduces this overhead by up to 2 bytes by
> > encoding how large the block data length field is into the block ID,
> > and thus optionally reducing the block data's length field to 0 bytes.
> > Examples: cross-page update records will now be 2 bytes shorter,
> > because the record never registers any data for the new block of the
> > update; pgbench transactions are now either 6 or 8 bytes smaller
> > depending on whether the update crosses a page boundary (in xlog
> > record size; after alignment it is 0 or 4/8 bytes, depending on
> > MAXALIGN and whether the updates are cross-page updates).
> > 
> > It changes the block IDs used to fit in 6 bits, using the upper 2 bits
> > of the block_id field to store how much data is contained in the
> > record (0, <=UINT8_MAX, or <=UINT16_MAX bytes).
> 
> Perhaps we should introduce a few generic inline functions to do varint
> encoding. That could be useful in many places, while this scheme is very
> tailored for XLogRecordBlockHeader.

Yes - I proposed that and wrote an implementation of reasonably efficient
varint encoding. Here's my prototype:
https://postgr.es/m/20221004234952.anrguppx5owewb6n%40awork3.anarazel.de

I think it's a bad tradeoff to write lots of custom varint encodings, just to
eek out a bit more space savings. The increase in code complexity IMO makes it
a bad tradeoff.

Greetings,

Andres Freund



Re: XLog size reductions: smaller XLRec block header for PG17

От
Michael Paquier
Дата:
On Mon, Sep 18, 2023 at 04:03:38PM -0700, Andres Freund wrote:
> I think it's a bad tradeoff to write lots of custom varint encodings, just to
> eek out a bit more space savings. The increase in code complexity IMO makes it
> a bad tradeoff.

+1.
--
Michael

Вложения

Re: XLog size reductions: smaller XLRec block header for PG17

От
Matthias van de Meent
Дата:
On Tue, 19 Sept 2023 at 01:03, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-05-18 19:22:26 +0300, Heikki Linnakangas wrote:
> > On 18/05/2023 17:59, Matthias van de Meent wrote:
> > > It changes the block IDs used to fit in 6 bits, using the upper 2 bits
> > > of the block_id field to store how much data is contained in the
> > > record (0, <=UINT8_MAX, or <=UINT16_MAX bytes).
> >
> > Perhaps we should introduce a few generic inline functions to do varint
> > encoding. That could be useful in many places, while this scheme is very
> > tailored for XLogRecordBlockHeader.

This scheme is reused later for the XLogRecord xl_tot_len field over
at [0], and FWIW is thus being reused. Sure, it's tailored to this WAL
use case, but IMO we're getting good value from it. We don't use
protobuf or JSON for WAL, we use our own serialization format. Having
some specialized encoding/decoding in that format for certain fields
is IMO quite acceptable.

> Yes - I proposed that and wrote an implementation of reasonably efficient
> varint encoding. Here's my prototype:
> https://postgr.es/m/20221004234952.anrguppx5owewb6n%40awork3.anarazel.de

As I mentioned on that thread, that prototype has a significant
probability of doing nothing to improve WAL size, or even increasing
the WAL size for installations which consume a lot of OIDs.

> I think it's a bad tradeoff to write lots of custom varint encodings, just to
> eek out a bit more space savings.

This is only a single "custom" varint encoding though, if you can even
call it that. It makes a field's size depend on flags set in another
byte, which is not that much different from the existing use of
XLR_BLOCK_ID_DATA_[LONG, SHORT].

> The increase in code complexity IMO makes it a bad tradeoff.

Pardon me for asking, but what would you consider to be a good
tradeoff then? I think the code relating to the WAL storage format is
about as simple as you can get it within the feature set it provides
and the size of the resulting records. While I think there is still
much to gain w.r.t. WAL record size, I don't think we can get much of
those improvements without adding at least some amount of complexity,
something I think to be true for most components in PostgreSQL.

So, except for redesigning significant parts of the public WAL APIs,
are we just going to ignore any potential improvements because they
"increase code complexity"?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://commitfest.postgresql.org/43/4386/



Re: XLog size reductions: smaller XLRec block header for PG17

От
Aleksander Alekseev
Дата:
Hi,

> This scheme is reused later for the XLogRecord xl_tot_len field over
> at [0], and FWIW is thus being reused. Sure, it's tailored to this WAL
> use case, but IMO we're getting good value from it. We don't use
> protobuf or JSON for WAL, we use our own serialization format. Having
> some specialized encoding/decoding in that format for certain fields
> is IMO quite acceptable.
>
> > Yes - I proposed that and wrote an implementation of reasonably efficient
> > varint encoding. Here's my prototype:
> > https://postgr.es/m/20221004234952.anrguppx5owewb6n%40awork3.anarazel.de
>
> As I mentioned on that thread, that prototype has a significant
> probability of doing nothing to improve WAL size, or even increasing
> the WAL size for installations which consume a lot of OIDs.
>
> > I think it's a bad tradeoff to write lots of custom varint encodings, just to
> > eek out a bit more space savings.
>
> This is only a single "custom" varint encoding though, if you can even
> call it that. It makes a field's size depend on flags set in another
> byte, which is not that much different from the existing use of
> XLR_BLOCK_ID_DATA_[LONG, SHORT].
>
> > The increase in code complexity IMO makes it a bad tradeoff.
>
> Pardon me for asking, but what would you consider to be a good
> tradeoff then? I think the code relating to the WAL storage format is
> about as simple as you can get it within the feature set it provides
> and the size of the resulting records. While I think there is still
> much to gain w.r.t. WAL record size, I don't think we can get much of
> those improvements without adding at least some amount of complexity,
> something I think to be true for most components in PostgreSQL.
>
> So, except for redesigning significant parts of the public WAL APIs,
> are we just going to ignore any potential improvements because they
> "increase code complexity"?

Here are my two cents.

I definitely see the value in having reusable varint encoding. This
would probably be a good option for extendable TOAST pointers,
compression dictionaries, and perhaps other patches.

In this particular case however Matthias has good arguments that this
is not the right tool for this particular task, IMO. We don't use
rbtrees for everything that needs a map from x to y. Hash tables have
other compromises. Sometimes one container is a better fit, sometimes
the other, sometimes none and we implement a fine tuned container for
the given case. Same here.

-- 
Best regards,
Aleksander Alekseev



Re: XLog size reductions: smaller XLRec block header for PG17

От
vignesh C
Дата:
On Tue, 26 Sept 2023 at 02:09, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Tue, 19 Sept 2023 at 01:03, Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2023-05-18 19:22:26 +0300, Heikki Linnakangas wrote:
> > > On 18/05/2023 17:59, Matthias van de Meent wrote:
> > > > It changes the block IDs used to fit in 6 bits, using the upper 2 bits
> > > > of the block_id field to store how much data is contained in the
> > > > record (0, <=UINT8_MAX, or <=UINT16_MAX bytes).
> > >
> > > Perhaps we should introduce a few generic inline functions to do varint
> > > encoding. That could be useful in many places, while this scheme is very
> > > tailored for XLogRecordBlockHeader.
>
> This scheme is reused later for the XLogRecord xl_tot_len field over
> at [0], and FWIW is thus being reused. Sure, it's tailored to this WAL
> use case, but IMO we're getting good value from it. We don't use
> protobuf or JSON for WAL, we use our own serialization format. Having
> some specialized encoding/decoding in that format for certain fields
> is IMO quite acceptable.
>
> > Yes - I proposed that and wrote an implementation of reasonably efficient
> > varint encoding. Here's my prototype:
> > https://postgr.es/m/20221004234952.anrguppx5owewb6n%40awork3.anarazel.de
>
> As I mentioned on that thread, that prototype has a significant
> probability of doing nothing to improve WAL size, or even increasing
> the WAL size for installations which consume a lot of OIDs.
>
> > I think it's a bad tradeoff to write lots of custom varint encodings, just to
> > eek out a bit more space savings.
>
> This is only a single "custom" varint encoding though, if you can even
> call it that. It makes a field's size depend on flags set in another
> byte, which is not that much different from the existing use of
> XLR_BLOCK_ID_DATA_[LONG, SHORT].
>
> > The increase in code complexity IMO makes it a bad tradeoff.
>
> Pardon me for asking, but what would you consider to be a good
> tradeoff then? I think the code relating to the WAL storage format is
> about as simple as you can get it within the feature set it provides
> and the size of the resulting records. While I think there is still
> much to gain w.r.t. WAL record size, I don't think we can get much of
> those improvements without adding at least some amount of complexity,
> something I think to be true for most components in PostgreSQL.
>
> So, except for redesigning significant parts of the public WAL APIs,
> are we just going to ignore any potential improvements because they
> "increase code complexity"?

I'm seeing that there has been no activity in this thread for nearly 4
months, I'm planning to close this in the current commitfest unless
someone is planning to take it forward.

Regards,
Vignesh



Re: XLog size reductions: smaller XLRec block header for PG17

От
Aleksander Alekseev
Дата:
Hi,

> I'm seeing that there has been no activity in this thread for nearly 4
> months, I'm planning to close this in the current commitfest unless
> someone is planning to take it forward.

I don't think that closing CF entries purely due to inactivity is a
good practice (neither something we did before) as long as there is
code, it applies, etc. There are a lot of patches and few people
working on them. Inactivity in a given thread doesn't necessarily
indicate lack of interest, more likely lack of resources.

-- 
Best regards,
Aleksander Alekseev



Re: XLog size reductions: smaller XLRec block header for PG17

От
vignesh C
Дата:
On Mon, 22 Jan 2024 at 16:08, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> > I'm seeing that there has been no activity in this thread for nearly 4
> > months, I'm planning to close this in the current commitfest unless
> > someone is planning to take it forward.
>
> I don't think that closing CF entries purely due to inactivity is a
> good practice (neither something we did before) as long as there is
> code, it applies, etc. There are a lot of patches and few people
> working on them. Inactivity in a given thread doesn't necessarily
> indicate lack of interest, more likely lack of resources.

There are a lot of patches like this and there is no clear way to find
out if someone wants to work on it or if they have lost interest in
it. That is the reason, I thought to send out a mail so that the
author/reviewer can reply and take it to the next state like ready for
committer state. If the author/reviewer is not planning to work in
this commitfest, but has plans to work in the next commitfest we can
move this to the next commitfest. I don't see a better way to identify
if the patch has interest or not.

Regards,
Vignesh



Re: XLog size reductions: smaller XLRec block header for PG17

От
Robert Haas
Дата:
On Mon, Jan 22, 2024 at 5:38 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
> I don't think that closing CF entries purely due to inactivity is a
> good practice (neither something we did before) as long as there is
> code, it applies, etc. There are a lot of patches and few people
> working on them. Inactivity in a given thread doesn't necessarily
> indicate lack of interest, more likely lack of resources.

If the CF entry doesn't serve the purpose of allowing someone to find
patches in need of review, it's worse than useless. Patches that
aren't getting reviewed should stay in the CF until they do, or until
we make a collective decision that we don't care about or want that
particular patch enough to bother. But patches that are not ready for
review need to get out until such time as they are. Otherwise they're
just non-actionable clutter. And unfortunately we have so much of that
in the CF app now that finding things that really need review is time
consuming and difficult. That REALLY needs to be cleaned up.

In the case of this particular patch, I think the problem is that
there's no consensus on the design. There's not a ton of debate on
this thread, but thread [1] linked in the original post contains a lot
of vigorous debate about what the right thing to do is here and I
don't believe we reached any meeting of the minds. In light of that
lack of agreement, I'm honestly a bit confused why Matthias even found
it worthwhile to submit this on a new thread. I think we all agree
with him that there's room for improvement here, but we don't agree on
what particular form that improvement should take, and as long as that
agreement is lacking, I find it hard to imagine anything getting
committed. The task right now is to get agreement on something, and
leaving the CF entry open longer isn't going make the people who have
already expressed opinions start agreeing with each other more than
they do already.

It looks like I never replied to
https://www.postgresql.org/message-id/20221019192130.ebjbycpw6bzjry4v%40awork3.anarazel.de
but, FWIW, I agree with Andres that applying the same technique to
multiple fields that are stored together (DB OID, TS OID, rel #, block
#) is unlikely in practice to produce many cases that regress. But the
question for this thread is really more about whether we're OK with
using ad-hoc bit swizzling to reduce the size of xlog records or
whether we want to insist on the use of a uniform varint encoding.
Heikki and Andres both seem to favor the latter. IIRC, I was initially
more optimistic about ad-hoc bit swizzling being a potentially
acceptable technique, but I'm not convinced enough about it to argue
against two very smart committers both of whom know more about
micro-optimizing performance than I do, and nobody else seems to
making this argument on this thread either, so I just don't really see
how this patch is ever going to go anywhere in its current form.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: XLog size reductions: smaller XLRec block header for PG17

От
Heikki Linnakangas
Дата:
On 22/01/2024 19:23, Robert Haas wrote:
> In the case of this particular patch, I think the problem is that
> there's no consensus on the design. There's not a ton of debate on
> this thread, but thread [1] linked in the original post contains a lot
> of vigorous debate about what the right thing to do is here and I
> don't believe we reached any meeting of the minds.

Yeah, so it seems.

> It looks like I never replied to
> https://www.postgresql.org/message-id/20221019192130.ebjbycpw6bzjry4v%40awork3.anarazel.de
> but, FWIW, I agree with Andres that applying the same technique to
> multiple fields that are stored together (DB OID, TS OID, rel #, block
> #) is unlikely in practice to produce many cases that regress. But the
> question for this thread is really more about whether we're OK with
> using ad-hoc bit swizzling to reduce the size of xlog records or
> whether we want to insist on the use of a uniform varint encoding.
> Heikki and Andres both seem to favor the latter. IIRC, I was initially
> more optimistic about ad-hoc bit swizzling being a potentially
> acceptable technique, but I'm not convinced enough about it to argue
> against two very smart committers both of whom know more about
> micro-optimizing performance than I do, and nobody else seems to
> making this argument on this thread either, so I just don't really see
> how this patch is ever going to go anywhere in its current form.

I don't have a clear idea of how to proceed with this either. Some 
thoughts I have:

Using varint encoding makes sense for length fields. The common values 
are small, and if a length of anything is large, then the size of the 
length field itself is insignificant compared to the actual data.

I don't like using varint encoding for OID. They might be small in 
common cases, but it feels wrong to rely on that. They're just arbitrary 
numbers. We could pick them randomly, it's just an implementation detail 
that we use a counter to choose the next one. I really dislike the idea 
that someone would do a pg_dump + restore, just to get smaller OIDs and 
smaller WAL as a result.

It does make sense to have a fast-path (small-path?) for 0 OIDs though.

To shrink OIDs fields, you could refer to earlier WAL records. A special 
value for "same relation as in previous record", or something like that. 
Now we're just re-inventing LZ-style compression though. Might as well 
use LZ4 or Snappy or something to compress the whole WAL stream. It's a 
bit tricky to get the crash-safety right, but shouldn't be impossible.

Has anyone seriously considered implementing wholesale compression of WAL?

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: XLog size reductions: smaller XLRec block header for PG17

От
Robert Haas
Дата:
On Fri, Feb 2, 2024 at 8:52 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> To shrink OIDs fields, you could refer to earlier WAL records. A special
> value for "same relation as in previous record", or something like that.
> Now we're just re-inventing LZ-style compression though. Might as well
> use LZ4 or Snappy or something to compress the whole WAL stream. It's a
> bit tricky to get the crash-safety right, but shouldn't be impossible.
>
> Has anyone seriously considered implementing wholesale compression of WAL?

I thought about the idea of referring to earlier WAL and/or undo
records when working on zheap. It seems tricky, because what if replay
starts after those WAL records and you can't refer back to them? It's
OK if you can make sure that you never depend on anything prior to the
latest checkpoint, but the natural way to make that work is to add
more looping like what we already do for FPIs, and then you have to
worry about whether that extra logic is going to be more expensive
than what you save. FPIs are so expensive that we can afford to go to
a lot of trouble to avoid them and still come out ahead, but storing
the full OID instead of an OID reference isn't nearly in the same
category.

I also thought about trying to refer to earlier items on the same
page, thinking that the locality would make things easier. But it
doesn't, because we don't know which page will ultimately contain the
WAL record until quite late, so we can't reason about what's on the
same page when constructing it.

Wholesale compression of WAL might run into some of the same issues,
e.g. if you don't want to compress each record individually, that
means you can't compress until you know the insert position. And even
then, if you want the previous data to be present in the compressor as
context, you almost need all the WAL compression to be done by a
single process. But if the LSNs are relative to the compressed stream,
you have to wait for that compression to finish before you can
determine the LSN of the next record, which seems super-painful, and
if they're relative to the uncompressed stream, then mapping them onto
fixed-size files gets tricky.

My hunch is that we can squeeze more out of the existing architecture
with a lot less work than it would take to do major rearchitecture
like compressing everything. I don't know how we can agree on a way of
doing that because everybody's got slightly different ideas about the
right way to do this. But if agreeing on how to evolve the system
we've got seems harder then rewriting it, we need to stop worrying
about WAL overhead and learn how to work together better.

--
Robert Haas
EDB: http://www.enterprisedb.com