Re: jsonb format is pessimal for toast compression
От | Heikki Linnakangas |
---|---|
Тема | Re: jsonb format is pessimal for toast compression |
Дата | |
Msg-id | 541C242E.3030004@vmware.com обсуждение исходный текст |
Ответ на | Re: jsonb format is pessimal for toast compression (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Ответы |
Re: jsonb format is pessimal for toast compression
Re: jsonb format is pessimal for toast compression Re: jsonb format is pessimal for toast compression |
Список | pgsql-hackers |
On 09/18/2014 09:27 PM, Heikki Linnakangas wrote: > On 09/18/2014 07:53 PM, Josh Berkus wrote: >> On 09/16/2014 08:45 PM, Tom Lane wrote: >>> We're somewhat comparing apples and oranges here, in that I pushed my >>> approach to something that I think is of committable quality (and which, >>> not incidentally, fixes some existing bugs that we'd need to fix in any >>> case); while Heikki's patch was just proof-of-concept. It would be worth >>> pushing Heikki's patch to committable quality so that we had a more >>> complete understanding of just what the complexity difference really is. >> >> Is anyone actually working on this? >> >> If not, I'm voting for the all-lengths patch so that we can get 9.4 out >> the door. > > I'll try to write a more polished patch tomorrow. We'll then see what it > looks like, and can decide if we want it. Ok, here are two patches. One is a refined version of my earlier patch, and the other implements the separate offsets array approach. They are both based on Tom's jsonb-lengths-merged.patch, so they include all the whitespace fixes etc. he mentioned. There is no big difference in terms of code complexity between the patches. IMHO the separate offsets array is easier to understand, but it makes for more complicated accessor macros to find the beginning of the variable-length data. Unlike Tom's patch, these patches don't cache any offsets when doing a binary search. Doesn't seem worth it, when the access time is O(1) anyway. Both of these patches have a #define JB_OFFSET_STRIDE for the "stride size". For the separate offsets array, the offsets array has one element for every JB_OFFSET_STRIDE children. For the other patch, every JB_OFFSET_STRIDE child stores the end offset, while others store the length. A smaller value makes random access faster, at the cost of compressibility / on-disk size. I haven't done any measurements to find the optimal value, the values in the patches are arbitrary. I think we should bite the bullet and break compatibility with 9.4beta2 format, even if we go with "my patch". In a jsonb object, it makes sense to store all the keys first, like Tom did, because of cache benefits, and the future possibility to do smart EXTERNAL access. Also, even if we can make the on-disk format compatible, it's weird that you can get different runtime behavior with datums created with a beta version. Seems more clear to just require a pg_dump + restore. Tom: You mentioned earlier that your patch fixes some existing bugs. What were they? There were a bunch of whitespace and comment fixes that we should apply in any case, but I couldn't see any actual bugs. I think we should apply those fixes separately, to make sure we don't forget about them, and to make it easier to review these patches. - Heikki
Вложения
В списке pgsql-hackers по дате отправления: