Re: Bitmap index thoughts
От | Heikki Linnakangas |
---|---|
Тема | Re: Bitmap index thoughts |
Дата | |
Msg-id | 45924FA8.2010707@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Bitmap index thoughts (Gavin Sherry <swm@linuxworld.com.au>) |
Ответы |
Re: Bitmap index thoughts
|
Список | pgsql-hackers |
Gavin Sherry wrote: > On Tue, 26 Dec 2006, Heikki Linnakangas wrote: >> for typical bitmap index use cases and most of the needed pages should >> stay in memory, but could we simplify this? Why do we need the auxiliary >> heap, couldn't we just store the blk+offset of the LOV item directly in >> the b-tree index item? > > The problem is, the b-tree code is very much tied to the heap. I don't > want to modify the b-tree code to make bitmap indexes work (better). > What's really tempting is to just manage our own balanced tree within the > bitmap index file(s) itself. It would start from the metapage and simply > spill to other 'special' index pages when necessary. The problem is, we do > not have b-tree code generic enough that it would allow us to do this > trivially -- consider concurrency and WAL in particular, which we > currently get for free. I guess this is why I've been ignoring this issue > :-). Maybe we could reuse the code in ginbtree.c. Looks like Teodor & Oleg had the same problem :). Modifying the nbtree code doesn't seem that difficult either. AFAICS, the only places where the heap is from within nbtree code is in index building and uniqueness checks. >> And instead of having separate LOV pages that store a number of LOV >> items, how about storing each LOV item on a page of it's own, and using >> the rest of the page to store the last chunk of the bitmap. That would >> eliminate one page access, but more importantly, maybe we could then get >> rid of all the bm_last_* attributes in BMLOVItemData that complicate the >> patch quite a bit, while preserving the performance. > > That's an interesting approach. We would still need a concept of > last_word, at the very least, and probably last_comp_word for convenience. Why? > PS: Another versio of the patch shall be forthcoming shortly. I've been > working on compressing the data in memory during CREATE INDEX instead of > just managing arrays of TIDs in memory as we did previously. The array of > TIDs works great for well clustered data but it stinks for poorly > clustered data as we approach maintenance_word_mem and have to swap a lot. Ok, sounds good. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: