Re: GIN improvements part 1: additional information
От | Heikki Linnakangas |
---|---|
Тема | Re: GIN improvements part 1: additional information |
Дата | |
Msg-id | 52B4694A.3060201@vmware.com обсуждение исходный текст |
Ответ на | Re: GIN improvements part 1: additional information (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Список | pgsql-hackers |
On 12/19/2013 10:44 AM, Heikki Linnakangas wrote: > On 12/19/2013 08:37 AM, Oleg Bartunov wrote: >> Guys, >> >> before digging deep into the art of comp/decomp world I'd like to know >> if you familiar with results of >> http://wwwconference.org/www2008/papers/pdf/p387-zhangA.pdf paper and >> some newer research ? > > Yeah, I saw that paper. > >> Do we agree in what we really want ? Basically, >> there are three main features: size, compression, decompression speed >> - we should take two :) > > According to that Zhang et al paper you linked, the Vbyte method > actually performs the worst on all of those measures. The other > algorithms are quite similar in terms of size (PForDelta being the most > efficient), while PForDelta is significantly faster to compress/decompress. > > Just by looking at those numbers, PForDelta looks like a clear winner. > However, it operates on much bigger batches than the other algorithms; I > haven't looked at it in detail, but Zhang et al used 128 integer > batches, and they say that 32 integers is the minimum batch size. If we > want to use it for the inline posting lists stored in entry tuples, that > would be quite wasteful if there are only a few item pointers on the tuple. > > Also, in the tests I've run, the compression/decompression speed is not > a significant factor in total performance, with either varbyte encoding > or Simple9-like encoding I hacked together. One disadvantage of Simple9 and other encodings that operate in batches is that removing a value from the middle can increase the number of bytes required for the remaining values. That's a problem during vacuum; it's possible that after vacuuming away one item pointer, the remaining items no longer fit on the page. - Heikki
В списке pgsql-hackers по дате отправления: