Re: Compression and on-disk sorting

Поиск
Список
Период
Сортировка
От Andrew Piskorski
Тема Re: Compression and on-disk sorting
Дата
Msg-id 20060517085230.GA53017@tehun.pair.com
обсуждение исходный текст
Ответ на Re: Compression and on-disk sorting  (Greg Stark <gsstark@mit.edu>)
Ответы Re: Compression and on-disk sorting  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
On Tue, May 16, 2006 at 11:48:21PM -0400, Greg Stark wrote:

> There are some very fast decompression algorithms:
> 
> http://www.oberhumer.com/opensource/lzo/

Sure, and for some tasks in PostgreSQL perhaps it would be useful.
But at least as of July 2005, a Sandor Heman, one of the MonetDB guys,
had looked at zlib, bzlib2, lzrw, and lzo, and claimed that:
 "... in general, it is very unlikely that we could achieve any bandwidth gains with these algorithms. LZRW and LZO
mightincrease bandwidth on relatively slow disk systems, with bandwidths up to 100MB/s, but this would induce high
processingoverheads, which interferes with query execution. On a fast disk system, such as our 350MB/s 12 disk RAID,
allthe generic algorithms will fail to achieve any speedup."
 
 http://www.google.com/search?q=MonetDB+LZO+Heman&btnG=Search http://homepages.cwi.nl/~heman/downloads/msthesis.pdf

> I think most of the mileage from "lookup tables" would be better implemented
> at a higher level by giving tools to data modellers that let them achieve
> denser data representations. Things like convenient enum data types, 1-bit
> boolean data types, short integer data types, etc.

Things like enums and 1 bit booleans certainly could be useful, but
they cannot take advantage of duplicate values across multiple rows at
all, even if 1000 rows have the exact same value in their "date"
column and are all in the same disk block, right?

Thus I suspect that the exact opposite is true, a good table
compression scheme would render special denser data types largely
redundant and obsolete.

Good table compression might be a lot harder to do, of course.
Certainly Oracle's implementation of it had some bugs which made it
difficult to use reliably in practice (in certain circumstances
updates could fail, or if not fail perhaps have pathological
performance), bugs which are supposed to be fixed in 10.2.0.2, which
was only released within the last few months.

-- 
Andrew Piskorski <atp@piskorski.com>
http://www.piskorski.com/


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Compression and on-disk sorting
Следующее
От: "Zeugswetter Andreas DCP SD"
Дата:
Сообщение: Re: Compression and on-disk sorting