[HACKERS] [POC] A better way to expand hash indexes.
От | Mithun Cy |
---|---|
Тема | [HACKERS] [POC] A better way to expand hash indexes. |
Дата | |
Msg-id | CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: [HACKERS] [POC] A better way to expand hash indexes.
|
Список | pgsql-hackers |
Hi all, As of now, we expand the hash index by doubling the number of bucket blocks. But unfortunately, those blocks will not be used immediately. So I thought if we can differ bucket block allocation by some factor, hash index size can grow much efficiently. I have written a POC patch which does following things - Say at overflow point 'i' we need to add new "x = 2^(i-1)" buckets as per the old code, I think we can do this addition of buckets in a controlled way. Instead of adding all of 'x' bucket blocks at a time, the patch will add x/4 blocks at a time. And, once those blocks are consumed, it adds next installment of x/4 blocks. And, this will continue until all of 'x' blocks of the overflow point 'i' is allocated. My test result shows index size grows in a more efficient way with above patch. Note: This patch is just a POC. It can have bugs and I have to change comments in the code, and README which are related to new changes. Test: create table t1(t int); create index i1 on t1 using hash(t); And then, Incrementally add rows as below. insert into t1 select generate_series(1, 10000000); records base index new patch in million -- size(MB) -- index size(MB) 10 384 384 20 768 768 30 1417 1161 40 1531 1531 50 2556 1788 60 2785 2273 70 2963 2709 80 3060 3061 90 5111 3575 100 5111 3575 To implement such an incremental addition of bucket blocks I have to increase the size of array hashm_spares in meta page by four times. Also, mapping methods which map a total number of buckets to a split-point position of hashm_spares array, need to be changed. These changes create backward compatibility issues. Implementation Details in brief: ======================= Each element of hashm_spares (we call overflow point) is expanded into 4 slots {0, 1, 2, 3}. If 'x' (a power2 number) is the total number of buckets to be added before the overflow point we add only a quarter of it (x/4) to each slot, once we have consumed previously added blocks we add next quarter of buckets and so on. As in old code new hashm_spares[i] stores total overflow pages allocated between those bucket allocation. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: