LZ compressing data type
От | wieck@debis.com (Jan Wieck) |
---|---|
Тема | LZ compressing data type |
Дата | |
Msg-id | m11oDim-0003kGC@orion.SAPserv.Hamburg.dsh.de обсуждение исходный текст |
Ответы |
Re: [HACKERS] LZ compressing data type
|
Список | pgsql-hackers |
Hi, I just committed some changes that require an initdb. New are the discussed, simple LZ compressor, placed into /utils/adt/pg_compress.c, and a new lztext data type based on it. You'll find a fairly detailed description of the compression algorithm in the comments at the top of pg_lzcompress.c. Not very surprisingly to me it turns out, that the compressor does a very good job on rule action strings. I used the 48 rules that can be found in pg_rewrite after the regression test. The original string sizes range from 820 to 4615 and the compression rates from 35-76% with an average of 60%. The 4615 size rule action has been coded into a 1126 octet_length. For the lztext type, there are conversion functions to/from text and the length() and octet_length() functions available. Length() returns the same as length on text would. While octet_length returns the compressed size without VARHDRSZ. The type does not support MULTIBYTE or CYR_ENCODE up to now. It shouldn't be too hard to add it and after that, we might add another lzbpchar type too. The latter is really interesting, because an empty char(200) (thus containing 200 spaces) could result in an octet_length of 12 instead of 204 - that's a compression rate of 94.1%! It actually wouldn't, because the compressors default is to start only if the input is at least 256 bytes, but there is a mechanism so a lzbpchar type could force this behaviour. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
В списке pgsql-hackers по дате отправления: