Re: jsonb format is pessimal for toast compression
От | Heikki Linnakangas |
---|---|
Тема | Re: jsonb format is pessimal for toast compression |
Дата | |
Msg-id | 53EC8194.4020804@vmware.com обсуждение исходный текст |
Ответ на | Re: jsonb format is pessimal for toast compression (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: jsonb format is pessimal for toast compression
|
Список | pgsql-hackers |
On 08/14/2014 04:01 AM, Tom Lane wrote: > I wrote: >> That's a fair question. I did a very very simple hack to replace the item >> offsets with item lengths -- turns out that that mostly requires removing >> some code that changes lengths to offsets ;-). I then loaded up Larry's >> example of a noncompressible JSON value, and compared pg_column_size() >> which is just about the right thing here since it reports datum size after >> compression. Remembering that the textual representation is 12353 bytes: > >> json: 382 bytes >> jsonb, using offsets: 12593 bytes >> jsonb, using lengths: 406 bytes > > Oh, one more result: if I leave the representation alone, but change > the compression parameters to set first_success_by to INT_MAX, this > value takes up 1397 bytes. So that's better, but still more than a > 3X penalty compared to using lengths. (Admittedly, this test value > probably is an outlier compared to normal practice, since it's a hundred > or so repetitions of the same two strings.) For comparison, here's a patch that implements the scheme that Alexander Korotkov suggested, where we store an offset every 8th element, and a length in the others. It compresses Larry's example to 525 bytes. Increasing the "stride" from 8 to 16 entries, it compresses to 461 bytes. A nice thing about this patch is that it's on-disk compatible with the current format, hence initdb is not required. (The current comments claim that the first element in an array always has the JENTRY_ISFIRST flags set; that is wrong, there is no such flag. I removed the flag in commit d9daff0e, but apparently failed to update the comment and the accompanying JBE_ISFIRST macro. Sorry about that, will fix. This patch uses the bit that used to be JENTRY_ISFIRST to mark entries that store a length instead of an end offset.). - Heikki
Вложения
В списке pgsql-hackers по дате отправления: