[PATCH] Compression dictionaries for JSONB
От | Aleksander Alekseev |
---|---|
Тема | [PATCH] Compression dictionaries for JSONB |
Дата | |
Msg-id | CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: [PATCH] Compression dictionaries for JSONB
Re: [PATCH] Compression dictionaries for JSONB |
Список | pgsql-hackers |
Hi hackers,
This is a follow-up thread to `RFC: compression dictionaries for JSONB` [1]. I would like to share my current progress in order to get early feedback. The patch is currently in a draft state but implements the basic functionality. I did my best to account for all the great feedback I previously got from Alvaro and Matthias.
Usage example:
```
CREATE TYPE mydict AS DICTIONARY OF jsonb ('aaa', 'bbb');
SELECT '{"aaa":"bbb"}' :: mydict;
mydict
----------------
{"aaa": "bbb"}
SELECT ('{"aaa":"bbb"}' :: mydict) -> 'aaa';
?column?
----------
"bbb"
```
Here `mydict` works as a transparent replacement for `jsonb`. However, its internal representation differs. The provided dictionary entries ('aaa', 'bbb') are stored in the new catalog table:
```
SELECT * FROM pg_dict;
oid | dicttypid | dictentry
-------+-----------+-----------
39476 | 39475 | aaa
39477 | 39475 | bbb
Current limitations (todo):
- ALTER TYPE is not implemented
- Tests and documentation are missing
- Autocomplete is missing
Future work (out of scope of this patch):
- Support types other than JSONB: TEXT, XML, etc
- Automatically updated dictionaries, e.g. during VACUUM
- Alternative compression algorithms. Note that this will not require any further changes in the catalog, only the values we write to pg_type and pg_cast will differ.
Open questions:
- Dictionary entries are currently stored as NameData, the same type that is used for enums. Are we OK with the accompanying limitations? Any alternative suggestions?
Your feedback is very much welcomed!
[1]: https://postgr.es/m/CAJ7c6TPx7N-bVw0dZ1ASCDQKZJHhBYkT6w4HV1LzfS%2BUUTUfmA%40mail.gmail.com
--
Best regards,
Aleksander Alekseev
This is a follow-up thread to `RFC: compression dictionaries for JSONB` [1]. I would like to share my current progress in order to get early feedback. The patch is currently in a draft state but implements the basic functionality. I did my best to account for all the great feedback I previously got from Alvaro and Matthias.
Usage example:
```
CREATE TYPE mydict AS DICTIONARY OF jsonb ('aaa', 'bbb');
SELECT '{"aaa":"bbb"}' :: mydict;
mydict
----------------
{"aaa": "bbb"}
SELECT ('{"aaa":"bbb"}' :: mydict) -> 'aaa';
?column?
----------
"bbb"
```
Here `mydict` works as a transparent replacement for `jsonb`. However, its internal representation differs. The provided dictionary entries ('aaa', 'bbb') are stored in the new catalog table:
```
SELECT * FROM pg_dict;
oid | dicttypid | dictentry
-------+-----------+-----------
39476 | 39475 | aaa
39477 | 39475 | bbb
(2 rows)
```
When `mydict` sees 'aaa' in the document, it replaces it with the corresponding code, in this case - 39476. For more details regarding the compression algorithm and choosen compromises please see the comments in the patch.
In pg_type `mydict` has typtype = TYPTYPE_DICT. It works the same way as TYPTYPE_BASE with only difference: corresponding `<type>_in` (pg_type.typinput) and `<another-type>_<type>` (pg_cast.castfunc) procedures receive the dictionary Oid as a `typmod` argument. This way the procedures can distinguish `mydict1` from `mydict2` and use the proper compression dictionary.
The approach with alternative `typmod` role is arguably a bit hacky, but it was the less invasive way to implement the feature I've found. I'm open to alternative suggestions.
Current limitations (todo):
- ALTER TYPE is not implemented
- Tests and documentation are missing
- Autocomplete is missing
Future work (out of scope of this patch):
- Support types other than JSONB: TEXT, XML, etc
- Automatically updated dictionaries, e.g. during VACUUM
- Alternative compression algorithms. Note that this will not require any further changes in the catalog, only the values we write to pg_type and pg_cast will differ.
Open questions:
- Dictionary entries are currently stored as NameData, the same type that is used for enums. Are we OK with the accompanying limitations? Any alternative suggestions?
- All in all, am I moving the right direction?
[1]: https://postgr.es/m/CAJ7c6TPx7N-bVw0dZ1ASCDQKZJHhBYkT6w4HV1LzfS%2BUUTUfmA%40mail.gmail.com
--
Best regards,
Aleksander Alekseev
Вложения
В списке pgsql-hackers по дате отправления: