Re: jsonb format is pessimal for toast compression
От | Andrew Dunstan |
---|---|
Тема | Re: jsonb format is pessimal for toast compression |
Дата | |
Msg-id | 53E4EE5F.5090904@dunslane.net обсуждение исходный текст |
Ответ на | Re: jsonb format is pessimal for toast compression (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: jsonb format is pessimal for toast compression
|
Список | pgsql-hackers |
On 08/08/2014 11:18 AM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 08/07/2014 11:17 PM, Tom Lane wrote: >>> I looked into the issue reported in bug #11109. The problem appears to be >>> that jsonb's on-disk format is designed in such a way that the leading >>> portion of any JSON array or object will be fairly incompressible, because >>> it consists mostly of a strictly-increasing series of integer offsets. > >> Back when this structure was first presented at pgCon 2013, I wondered >> if we shouldn't extract the strings into a dictionary, because of key >> repetition, and convinced myself that this shouldn't be necessary >> because in significant cases TOAST would take care of it. > That's not really the issue here, I think. The problem is that a > relatively minor aspect of the representation, namely the choice to store > a series of offsets rather than a series of lengths, produces > nonrepetitive data even when the original input is repetitive. It would certainly be worth validating that changing this would fix the problem. I don't know how invasive that would be - I suspect (without looking very closely) not terribly much. > 2. Are we going to ship 9.4 without fixing this? I definitely don't see > replacing pg_lzcompress as being on the agenda for 9.4, whereas changing > jsonb is still within the bounds of reason. > > Considering all the hype that's built up around jsonb, shipping a design > with a fundamental performance handicap doesn't seem like a good plan > to me. We could perhaps band-aid around it by using different compression > parameters for jsonb, although that would require some painful API changes > since toast_compress_datum() doesn't know what datatype it's operating on. > > Yeah, it would be a bit painful, but after all finding out this sort of thing is why we have betas. cheers andrew
В списке pgsql-hackers по дате отправления: