Re: jsonb format is pessimal for toast compression
От | Tom Lane |
---|---|
Тема | Re: jsonb format is pessimal for toast compression |
Дата | |
Msg-id | 14953.1407977550@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: jsonb format is pessimal for toast compression (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: jsonb format is pessimal for toast compression
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
Bruce Momjian <bruce@momjian.us> writes: > Seems we have two issues: > 1) the header makes testing for compression likely to fail > 2) use of pointers rather than offsets reduces compression potential > I understand we are focusing on #1, but how much does compression reduce > the storage size with and without #2? Seems we need to know that answer > before deciding if it is worth reducing the ability to do fast lookups > with #2. That's a fair question. I did a very very simple hack to replace the item offsets with item lengths -- turns out that that mostly requires removing some code that changes lengths to offsets ;-). I then loaded up Larry's example of a noncompressible JSON value, and compared pg_column_size() which is just about the right thing here since it reports datum size after compression. Remembering that the textual representation is 12353 bytes: json: 382 bytes jsonb, using offsets: 12593 bytes jsonb, using lengths: 406 bytes So this confirms my suspicion that the choice of offsets not lengths is what's killing compressibility. If it used lengths, jsonb would be very nearly as compressible as the original text. Hack attached in case anyone wants to collect more thorough statistics. We'd not actually want to do it like this because of the large expense of recomputing the offsets on-demand all the time. (It does pass the regression tests, for what that's worth.) regards, tom lane diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c index 04f35bf..2297504 100644 *** a/src/backend/utils/adt/jsonb_util.c --- b/src/backend/utils/adt/jsonb_util.c *************** convertJsonbArray(StringInfo buffer, JEn *** 1378,1385 **** errmsg("total size of jsonb array elements exceeds the maximum of %u bytes", JENTRY_POSMASK))); - if (i > 0) - meta = (meta & ~JENTRY_POSMASK) | totallen; copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry)); metaoffset += sizeof(JEntry); } --- 1378,1383 ---- *************** convertJsonbObject(StringInfo buffer, JE *** 1430,1437 **** errmsg("total size of jsonb array elements exceeds the maximum of %u bytes", JENTRY_POSMASK))); - if (i > 0) - meta = (meta & ~JENTRY_POSMASK) | totallen; copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry)); metaoffset += sizeof(JEntry); --- 1428,1433 ---- *************** convertJsonbObject(StringInfo buffer, JE *** 1445,1451 **** errmsg("total size of jsonb array elements exceeds the maximum of %u bytes", JENTRY_POSMASK))); - meta = (meta & ~JENTRY_POSMASK) | totallen; copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry)); metaoffset += sizeof(JEntry); } --- 1441,1446 ---- *************** uniqueifyJsonbObject(JsonbValue *object) *** 1592,1594 **** --- 1587,1600 ---- object->val.object.nPairs = res + 1 - object->val.object.pairs; } } + + uint32 + jsonb_get_offset(const JEntry *ja, int index) + { + uint32 off = 0; + int i; + + for (i = 0; i < index; i++) + off += JBE_LEN(ja, i); + return off; + } diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h index 5f2594b..c9b18e1 100644 *** a/src/include/utils/jsonb.h --- b/src/include/utils/jsonb.h *************** typedef uint32 JEntry; *** 153,162 **** * Macros for getting the offset and length of an element. Note multiple * evaluations and access to prior array element. */ ! #define JBE_ENDPOS(je_) ((je_) & JENTRY_POSMASK) ! #define JBE_OFF(ja, i) ((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1])) ! #define JBE_LEN(ja, i) ((i) == 0 ? JBE_ENDPOS((ja)[i]) \ ! : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1])) /* * A jsonb array or object node, within a Jsonb Datum. --- 153,163 ---- * Macros for getting the offset and length of an element. Note multiple * evaluations and access to prior array element. */ ! #define JBE_LENFLD(je_) ((je_) & JENTRY_POSMASK) ! #define JBE_OFF(ja, i) jsonb_get_offset(ja, i) ! #define JBE_LEN(ja, i) JBE_LENFLD((ja)[i]) ! ! extern uint32 jsonb_get_offset(const JEntry *ja, int index); /* * A jsonb array or object node, within a Jsonb Datum.
В списке pgsql-hackers по дате отправления: