Re: GIN improvements part 1: additional information
От | Alexander Korotkov |
---|---|
Тема | Re: GIN improvements part 1: additional information |
Дата | |
Msg-id | CAPpHfdtRAxaq8mShtpd4mh5R0=5hP900mBJNU5TnyaeM44EEyA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: GIN improvements part 1: additional information (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Ответы |
Re: GIN improvements part 1: additional information
|
Список | pgsql-hackers |
On Tue, Dec 10, 2013 at 12:26 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
------
With best regards,
Alexander Korotkov.
On 12/09/2013 11:34 AM, Alexander Korotkov wrote:Even if we use varbyte encoding, I wonder if it would be better to treat
block + offset number as a single 48-bit integer, rather than encode them
separately. That would allow the delta of two items on the same page to be
stored as a single byte, rather than two bytes. Naturally it would be a
loss on other values, but would be nice to see some kind of an analysis on
that. I suspect it might make the code simpler, too.
Yeah, I had that idea, but I thought it's not a better option. Will try to
do some analysis.
The more I think about that, the more convinced I am that it's a good idea. I don't think it will ever compress worse than the current approach of treating block and offset numbers separately, and, although I haven't actually tested it, I doubt it's any slower. About the same amount of arithmetic is required in both versions.
Attached is a version that does that. Plus some other minor cleanup.
(we should still investigate using a completely different algorithm, though)
Yes, when I though about that, I didn't realize that we can reserve less than 16 bits for offset number.
I rerun my benchmark and got following results:
event | period
-----------------------+-----------------
index_build | 00:01:46.39056
index_build_recovery | 00:00:05
index_update | 00:06:01.557708
index_update_recovery | 00:01:23
search_new | 00:24:05.600366
search_updated | 00:25:29.520642
(6 rows)
label | blocks_mark
----------------+-------------
search_new | 847509920
search_updated | 883789826
(2 rows)
label | size
---------------+-----------
new | 364560384
after_updates | 642736128
(2 rows)
Speed is same while index size is less. In previous format it was:
label | size
---------------+-----------
new | 419299328
after_updates | 715915264
(2 rows)
Good optimization, thanks. I'll try another datasets but I expect similar results.
However, patch didn't apply to head. Corrected version is attached.
With best regards,
Alexander Korotkov.
Вложения
В списке pgsql-hackers по дате отправления: