Re: Reducing data type space usage
От | Heikki Linnakangas |
---|---|
Тема | Re: Reducing data type space usage |
Дата | |
Msg-id | 450C784C.8040001@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Reducing data type space usage (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: >> The user would have to decide that he'll never need a value over 127 >> bytes >> long ever in order to get the benefit. > > Weren't you the one that's been going on at great length about how > wastefully we store CHAR(1) ? Sure, this has a somewhat restricted > use case, but it's about as efficient as we could possibly get within > that use case. I like the idea of having variable length headers much more than a new short character type. It solves a more general problem, and it compresses VARCHAR(>255) TEXT fields nicely when the actual data in the field is small. I'd like to propose one more encoding scheme, based on on Tom's earlier proposals. The use cases I care about are: * support uncompressed data up to 1G, like we do now * 1 byte length word for short data. * store typical CHAR(1) values in just 1 byte. Tom wrote:> * 0xxxxxxx uncompressed 4-byte length word as stated above> * 10xxxxxx 1-byte length word, up to 62 bytes ofdata> * 110xxxxx 2-byte length word, uncompressed inline data> * 1110xxxx 2-byte length word, compressed inline data> *1111xxxx 1-byte length word, out-of-line TOAST pointer My proposal is: 00xxxxxx uncompressed, aligned 4-byte length word 010xxxxx 1-byte length word, uncompressed inline data (up to 32 bytes) 011xxxxx 2-byte length word, uncompressed inline data (up to 8k) 1xxxxxxx 1 byte data in range 0x20-0x7E 1000xxxx 2-byte length word, compressed inline data (up to 4k) 11111111 TOAST pointer The decoding algorithm is similar to Tom's proposal, and relies on using 0x00 for padding. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: