> I'm rather disinclined to change the on-disk format because of this > specific test, that feels a bit like the tail wagging the dog to me, > especially as I do hope that some day we'll figure out a way to use a > better compression algorithm than pglz.
I'm unimpressed by that argument too, for a number of reasons:
1. The real problem here is that jsonb is emitting quite a bit of fundamentally-nonrepetitive data, even when the user-visible input is very repetitive. That's a compression-unfriendly transformation by anyone's measure. Assuming that some future replacement for pg_lzcompress() will nonetheless be able to compress the data strikes me as mostly wishful thinking. Besides, we'd more than likely have a similar early-exit rule in any substitute implementation, so that we'd still be at risk even if it usually worked.
Would an answer be to switch the location of the jsonb "header" data to the end of the field as opposed to the beginning of the field? That would allow pglz to see what it wants to see early on and go to work when possible?
Add an offset at the top of the field to show where to look - but then it would be the same in terms of functionality outside of that? Or pretty close?