Re: why is gist index taking so much space on the disc
От | Martijn van Oosterhout |
---|---|
Тема | Re: why is gist index taking so much space on the disc |
Дата | |
Msg-id | 20051121194527.GG16764@svana.org обсуждение исходный текст |
Ответ на | Re: why is gist index taking so much space on the disc (Grzegorz Jaskiewicz <gj@pointblue.com.pl>) |
Ответы |
Re: why is gist index taking so much space on the disc
|
Список | pgsql-hackers |
On Mon, Nov 21, 2005 at 08:14:44PM +0100, Grzegorz Jaskiewicz wrote: > >You mean you sometimes put the same elements in the two halves? You > >shouldn't do that. The whole point is that the search will descend any > >node that matches consistant, but any single key should only appear > >once in each index. > > > >picksplit should *split* the set, not return two sets about the same > >size as you started... > > Nope, I mean that 'masks' created to match either 'half' sometimes > match elements in the other one. > This shouldn't be a big deal, just one level to go down on query to > much more specific result set. > I have fixed that with, somewhat hack. It's not a hack, that's how it's supposed to work. An entry should only appear once in the index, but it could appear in multiple places. Like you say, some entries can go into either half. B-Trees are the rather special case that you can always split a set of values into two non-overlapping sets. With geometric types (like your bitmasks) you can't avoid overlap sometimes so you have to follow multiple branches to check if an element is there or not. Your pseudo code is good and will work fine. However, ideally you want to divide the overlap in such a way that later splits work better. Maybe by trying to decide which mask is "closer". The better the splitting the more efficient your tree will become. Ofcourse, perfect splitting may be expensive but then it depends on how many inserts vs how many selects. If you do a lot of searches it may be worth the time. BTW, I glad you're making progress and hopefully you might be able to publish some code. PostgreSQL could do with some example GiST indexes on bitmaps. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
В списке pgsql-hackers по дате отправления: