Re: Sparse bit set data structure
От | Heikki Linnakangas |
---|---|
Тема | Re: Sparse bit set data structure |
Дата | |
Msg-id | dbf83e18-a3f8-145e-1d28-d7e07cb391a2@iki.fi обсуждение исходный текст |
Ответ на | Re: Sparse bit set data structure (Andrey Borodin <x4mmm@yandex-team.ru>) |
Список | pgsql-hackers |
On 14/03/2019 07:15, Andrey Borodin wrote: >> 14 марта 2019 г., в 0:18, Heikki Linnakangas <hlinnaka@iki.fi> написал(а): >> <0001-Add-SparseBitset-to-hold-a-large-set-of-64-bit-ints-.patch><0002-Andrey-Borodin-s-test_blockset-tool-adapted-for-Spar.patch> > > That is very interesting idea. Basically, B-tree and radix tree is a tradeoff between space and time. > > In general, lookup into radix tree will touch less CPU cache lines. > In this terms Bitmapset is on most performant and memory-wasteful side: lookup into Bitmapset is always 1 cache line. > Performance of radix tree can be good in case of skewed distribution, while B-tree will be OK on uniform. I think thatdistribution of GiST inner pages is uniform, distribution of empty leafs is... I have no idea, let's consider uniformtoo. Yeah. In this implementation, the leaf nodes are packed into bitmaps when possible, so it should perform quite well on skewed distributions, too. > I'd review this data structure ASAP. I just need to understand: do we aim to v12 or v13? (I did not solve concurrency issuesin GiST VACUUM yet, but will fix them at weekend) I'm aiming v12 with this. It's a fairly large patch, but it's very isolated. I think the most pressing issue is getting the rest of the GiST vacuum patch fixed. If you get that fixed over the weekend, I'll take another look at it on Monday. > Also, maybe we should consider using RoaringBitmaps? [0] > As a side not I would add that while balanced trees are widely used for operations on external memory, there are more performantversions for main memory. Like AVL-tree and RB-tree. Hmm. Yeah, this is quite similar to Roaring Bitmaps. Roaring bitmaps also have a top-level, at which you binary search, and "leaf" nodes which can be bitmaps or arrays. In a roaring bitmap, the key space is divided into fixed-size chunks, like in a radix tree, but different from a B-tree. Even if we used AVL-trees or RB-trees or something else for the top layers of the tree, I think at the bottom level, we'd still need to use sorted arrays or bitmaps, to get the density we want. So I think the implementation at the leaf level would look pretty much the same, in any case. And the upper levels don't take very much space, regardless of the implementation. So I don't think it matters much. - Heikki
В списке pgsql-hackers по дате отправления: