Re: [HACKERS] Custom compression methods
От | Dilip Kumar |
---|---|
Тема | Re: [HACKERS] Custom compression methods |
Дата | |
Msg-id | CAFiTN-sBH65e4B5kQB5MT8BQOPajkrKvSdKA-07+ra3f2x4vCA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [HACKERS] Custom compression methods
|
Список | pgsql-hackers |
On Thu, Mar 18, 2021 at 1:32 AM Robert Haas <robertmhaas@gmail.com> wrote: > > ).On Mon, Mar 15, 2021 at 6:58 PM Andres Freund <andres@anarazel.de> wrote: > > - Adding all these indirect function calls via toast_compression[] just > > for all of two builtin methods isn't fun either. > > Yeah, it feels like this has too many layers of indirection now. Like, > toast_decompress_datum() first gets TOAST_COMPRESS_METHOD(attr). Then > it calls CompressionIdToMethod to convert one constant (like > TOAST_PGLZ_COMPRESSION_ID) to another constant with a slightly > different name (like TOAST_PGLZ_COMPRESSION). Then it calls > GetCompressionRoutines() to get hold of the function pointers. Then it > does an indirect functional call. That seemed like a pretty reasonable > idea when we were trying to support arbitrary compression AMs without > overly privileging the stuff that was built into core, but if we're > just doing stuff that's built into core, then we could just switch > (TOAST_COMPRESS_METHOD(attr)) and call the correct function. In fact, > we could even move the stuff from toast_compression.c into detoast.c, > which would allow the compiler to optimize better (e.g. by inlining, > if it wants). > > The same applies to toast_decompress_datum_slice(). Changed this, but I have still kept the functions in toast_compression.c. I think keeping compression related functionality in a separate file looks much cleaner. Please have a look and let me know that whether you still feel we should move it ti detoast.c. If the reason is that we can inline, then I feel we are already paying cost of compression/decompression and compare to that in lining a function will not make much difference. > There's a similar issue in toast_get_compression_method() and the only > caller, pg_column_compression(). Here the multiple mapping layers and > the indirect function call are split across those two functions rather > than all in the same one, but here again one could presumably find a > place to just switch on TOAST_COMPRESS_METHOD(attr) or > VARATT_EXTERNAL_GET_COMPRESSION(attr) and return "pglz" or "lz4" > directly. I have simplified that, only one level of function call from pg_column_compression, I have kept a toast_get_compression_id function because in later patch 0005, we will be using that for getting the compression id from the compressed data. > In toast_compress_datum(), I think we could have a switch that invokes > the appropriate compressor based on cmethod and sets a variable to the > value to be passed as the final argument of > TOAST_COMPRESS_SET_SIZE_AND_METHOD(). Done > Likewise, I suppose CompressionNameToMethod could at least be > simplified to use constant strings rather than stuff like > toast_compression[TOAST_PGLZ_COMPRESSION_ID].cmname. Done Other changes: - As suggested by Andres, remove compression method comparision from eualTupleDesc, because it is not required now. - I found one problem in existing patch, the problem was in detoast_attr_slice, if externally stored data is compressed then we compute max possible compressed size to fetch based on the slice length, for that we were using pglz_maximum_compressed_size, which is not correct for lz4. For lz4, I think we need to fetch the complete compressed data. We might think that for lz4 we might compute lie Min(LZ4_compressBound(slicelength, total_compressed_size); But IMHO, we can not do that and the reason is same that why we should not use PGLZ_MAX_OUTPUT for pglz (explained in the comment atop pglz_maximum_compressed_size). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: