On 08/26/2014 07:51 AM, Tom Lane wrote: > My feeling about it at this point is that the apparent speed gain from > using offsets is illusory: in practically all real-world cases where there > are enough keys or array elements for it to matter, costs associated with > compression (or rather failure to compress) will dominate any savings we > get from offset-assisted lookups. I agree that the evidence for this > opinion is pretty thin ... but the evidence against it is nonexistent.
Well, I have shown one test case which shows where lengths is a net penalty. However, for that to be the case, you have to have the following conditions *all* be true:
* lots of top-level keys * short values * rows which are on the borderline for TOAST * table which fits in RAM
... so that's a "special case" and if it's sub-optimal, no bigee. Also, it's not like it's an order-of-magnitude slower.
Anyway, I called for feedback on by blog, and have gotten some:
It would be really interesting to see your results with column STORAGE EXTERNAL for that benchmark. I think it is important to separate out the slowdown due to decompression now being needed vs that inherent in the new format, we can always switch off compression on a per-column basis using STORAGE EXTERNAL.
My JSON data has smallish objects with a small number of keys, it barely compresses at all with the patch and shows similar results to Arthur's data. Across ~500K rows I get:
encoded=# select count(properties->>'submitted_by') from compressed; count -------- 431948 (1 row)
Time: 250.512 ms
encoded=# select count(properties->>'submitted_by') from uncompressed; count -------- 431948 (1 row)