RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;
От | Dawid Kuroczko |
---|---|
Тема | RFD: ALTER COLUMN .. SET STORAGE COMPRESSED; |
Дата | |
Msg-id | 758d5e7f0806100445j13e4768fy4342554890c14a6c@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;
Re: RFD: ALTER COLUMN .. SET STORAGE COMPRESSED; Re: RFD: ALTER COLUMN .. SET STORAGE COMPRESSED; |
Список | pgsql-hackers |
Hello! Currently the TOASTing code does its magic when whole tuple is larger than TOAST_TUPLE_TARGET which happens to be around 2KB. There are times though when one is willing to trade using (fast) CPU to reduce amount of (slow) I/O. A data warehousing types of workload most notably. Rarely used large columns which are likely to compress well but are not large enough to trigger inline compression. As we already have four types of ALTER COLUMN .. SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } I would like to add "COMPRESSED" which would force column compression (if column is smaller than some minimun, I guess somwehwere between 16 and 32 bytes). First of all, would such a feature be desirable? [1] ...as for implementation idea, so far I see it more or less like this: * src/backend/access/common/heaptuple.c: for tuples with COMPRESSED attributes, we set the infomask bit HEAP_HASEXTERNAL,so that tuple will trigger TOAST regardless of size. * src/backend/access/heap/tuptoaster.c: - add a bool "need_compress = false;" around line 425. - while scanning the attributes(lines 472-575), mark the ones which should be COMPRESSED - if (need_compress), compress every marked column. -perhaps refactor inline compression code (639-659) as a static funcion shared with need_compress part above. Does this sound reasonable? PS: as a side note: I wonder if perhaps we could try compression erarlier, at 1KB or event at 0.5KB, but leave TOASTing at 2KB limit)? [1]: Actually some time ago I did write a system which stores tons of real[0:59] (an hour's worth of every minute readings) data. Such column takes approximately 246 bytes. For fun and experiment I did transform the data into real[0:23][0:59] storing whole day's data. To my surprise such column stores between 64 (!) and 5968. Also 66% of values were taking less than 254 bytes (and 55% < 128 bytes)... And as the data is much larger than RAM and read randomly, having it shrunk by more than 25% is tempting. Hence the idea of SET STORAGE COMPRESSED. I know such schema is flawed by design, but I guess there are other types of data which would also see benefit from such an option. -- Solving [site load issues] with [more database replication] is a lot like solving your own personal problems with heroin - at first it sorta works, but after a while things just get out of hand.
В списке pgsql-hackers по дате отправления: