Compression in LO and other fields
От | Cary O'Brien |
---|---|
Тема | Compression in LO and other fields |
Дата | |
Msg-id | 199911132223.RAA05987@saltmine.radix.net обсуждение исходный текст |
Список | pgsql-hackers |
Some observations about compression (sorry if this is obvious): Compression won't work unless there is redundancy in the information being compressed. In a normal [1] database, there may be a little redundancy in each row. The real redundancy will be when you group rows together. I.E. in it is quite likely that many rows will have identical or similar values for some columns. It may be worth considering compressing each storage block[2], in the hopes that each block will contain several rows, and that the rows will have some similar (redundant) information. Then compression will work. Just compressing one column of one row may work in some situations, but in general, it won't. Once a while ago we implemented a storage system where we grouped records (about 1k in size) into blocks (about 16k in size) and compressed each block individually. To get at a record you had to decompress the block it was in, and skip to the desired offset. Even though the system had to throw away a lot [3] of data for each retrieval, the system turned out to be be both economical of disk[4] storage and fast at retrieval. For this app we found that compression rates increased as we got up to about 16 records, and then flattend off. I.E. compressing 16 records used less space than compressing 4 groups of 4, but compressing 256 records together used as much space as compressing 16 groups of 16. The point? I wish I understood the PostgreSQL backend storage algorithms better, but it seems that the combination of a Tuneable block size [5] and an option to compress individual storage blocks [6] might be worth looking at. -- cary [1] Whatever the heck that is. [2] My lack of understanding of PostgreSQL storage manager internals is showing. [3] Half a block on average. [4] Optical disk in this particular app. [5] This is an option now, right? But is it compile time, or run-time? and is it postmaster-wide, per database? I *try*to read the pgsql-hackers-digest but there is just too much going on! [6] Can this work with the existing data file structure?
В списке pgsql-hackers по дате отправления: