Re: Proposal: custom compression methods
От | Tomas Vondra |
---|---|
Тема | Re: Proposal: custom compression methods |
Дата | |
Msg-id | 56715670.1000304@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: Proposal: custom compression methods (Simon Riggs <simon@2ndQuadrant.com>) |
Список | pgsql-hackers |
Hi, On 12/14/2015 12:51 PM, Simon Riggs wrote: > On 13 December 2015 at 17:28, Alexander Korotkov > <a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote: > > it would be nice to make compression methods pluggable. > > > Agreed. > > My thinking is that this should be combined with work to make use of > the compressed data, which is why Alvaro, Tomas, David have been > working on Col Store API for about 18 months and work on that > continues with more submissions for 9.6 due. I'm not sure it makes sense to combine those two uses of compression, because there are various differences - some subtle, some less subtle. It's a bit difficult to discuss this without any column store background, but I'll try anyway. The compression methods discussed in this thread, used to compress a single varlena value, are "general-purpose" in the sense that they operate on opaque stream of bytes, without any additional context (e.g. about structure of the data being compressed). So essentially the methods have an API like this: int compress(char *src, int srclen, char *dst, int dstlen); int decompress(char *src, int srclen, char *dst, int dstlen); And possibly some auxiliary methods like "estimate compressed length" and such. OTOH the compression methods we're messing with while working on the column store are quite different - they operate on columns (i.e. "arrays of Datums"). Also, column stores prefer "light-weight" compression methods like RLE or DICT (dictionary compression) because those methods allow execution on compressed data when done properly. Which for example requires additional info about the data type in the column, so that the RLE groups match the data type length. So the API of those methods looks quite different, compared to the general-purpose methods. Not only the compression/decompression methods will have additional parameters with info about the data type, but there'll be methods used for iterating over values in the compressed data etc. Of course, it'd be nice to have the ability to add/remove even those light-weight methods, but I'm not sure it makes sense to squash them into the same catalog. I can imagine a catalog suitable for both APIs (essentially having two groups of columns, one for each type of compression algorithm), but I can't really imagine a compression method providing both interfaces at the same time. In any case, I don't think this is the main challenge the patch needs to solve at this point. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: