Re: OK, does anyone have any better ideas?
От | mlw |
---|---|
Тема | Re: OK, does anyone have any better ideas? |
Дата | |
Msg-id | 3A323112.FCF35B3B@mohawksoft.com обсуждение исходный текст |
Ответ на | Re: OK, does anyone have any better ideas? (Oleg Bartunov <oleg@sai.msu.su>) |
Ответы |
Re: OK, does anyone have any better ideas?
|
Список | pgsql-hackers |
Oleg Bartunov wrote: > > We need multi-key B-tree like index for such problem. > Our full text search engine is based on arrays and we need to find quickly > is some number exists in array - some kind of index over int array. > We're currently testing GiST approach and seems will have some conclusions > soon. I think multi-key B-tree like index would be better in my > opinion, but this requires to much work and knowledge of postgres's internals. > Yesterday I read about UBTree, seems like it's good for index and query > sets. Currently postgres has no set specific methods. The way I do my search indexing is with bitmap objects and a word dictionary. One creates a searchable dictionary of words by scanning the selected records. So, in one query that results in 2 million records, the total aggregate number of words is about 60,000 depending on how you parse. For each word, you create a "bitmap object" (in one of a few forms) where bit '0' represents the first record, bit '1' represents the second, and so on, until you have 2 million bits. Set the correct bit in the bitmap for each document that contains that word. In the end you will have the equivalent 60,000 bitmaps or 2 million bits. During search time, one creates an empty bitmap of 2 million bits as a work space. One parses the search term, and performs boolean operation on the workspace from the bitmap retrieved for each word. When you are done parsing, you have a bitmap with a bit set for each document that fits the search criteria. You then enumerate the bits by bit position, and you now have a list of document numbers. If only I could get the list of document numbers back into postgres....... It would be great. -- http://www.mohawksoft.com
В списке pgsql-hackers по дате отправления: