Re: why is gist index taking so much space on the disc
От | Grzegorz Jaskiewicz |
---|---|
Тема | Re: why is gist index taking so much space on the disc |
Дата | |
Msg-id | FCB2D3B8-9CCD-404E-8728-45CDE4DBE93F@pointblue.com.pl обсуждение исходный текст |
Ответ на | why is gist index taking so much space on the disc (Grzegorz Jaskiewicz <gj@pointblue.com.pl>) |
Ответы |
Re: why is gist index taking so much space on the disc
|
Список | pgsql-hackers |
On 2005-11-21, at 19:32, Martijn van Oosterhout wrote: > On Mon, Nov 21, 2005 at 04:58:25PM +0100, Grzegorz Jaskiewicz wrote: >> my conquers with Gist index for custom type are nearly finished. It >> is working as it is now, but there are few problems here and there. >> One of em, being amount of disc space index it self takes. The type >> stucture it self takes 160bytes. Adding 100.000 rows into table - >> CREATE TABLE blah (a serial, b customType); > > Let's see, 160bytes means you'll get aboud 50 keys per page. So you > would expect 2000 leaf page, 40 level 1 pages. This should be less > than > 20-30MB > yep; > You mean you sometimes put the same elements in the two halves? You > shouldn't do that. The whole point is that the search will descend any > node that matches consistant, but any single key should only appear > once in each index. > > picksplit should *split* the set, not return two sets about the same > size as you started... Nope, I mean that 'masks' created to match either 'half' sometimes match elements in the other one. This shouldn't be a big deal, just one level to go down on query to much more specific result set. I have fixed that with, somewhat hack. Here's some pseudo code sort_data(input); find_split_point(input); mask1 = generate_two_masks(input[0]); mask2 = generate_two_masks(input[1]); foreach(input) {bool a = matches1(input);bool b = matches2(input);if ( a && b ) { if ( left_index == 0 ) { left[left_index++]= input; } else { if ( right_index == 0 ) { right[right_index++] = input; continue; } /* this part is new code, and helped a lot, now gist index takes much less space and is much faster, because of lower I/O consumption*/ if ( loop_index % 2 ) { right[right_index++] = input; } else { left[left_index++] = input; } }}else { if ( a) left[left_index++] =input; if ( b) right[right_index++] = input;} } mask1 = generate(left ); mask2 = generate(right); return (left, right, blah, blih, others); Ok, so the part with i%2 helped a lot, it distributes elements matching both masks evenly. Thanks guys. I will play with k-means, and see if they will work better with no hacks. Either way, I have to have some code that will handle "matches both" case. Thanks again. -- GJ "If we knew what we were doing, it wouldn't be called Research, would it?" - AE
В списке pgsql-hackers по дате отправления: