Re: scoring differences between bitmasks
От | Ben |
---|---|
Тема | Re: scoring differences between bitmasks |
Дата | |
Msg-id | F9149CFE-05C3-40D1-A6D8-58592A796B6E@silentmedia.com обсуждение исходный текст |
Ответ на | scoring differences between bitmasks (Ben <bench@silentmedia.com>) |
Список | pgsql-general |
Just the number of bits, not which ones. Basically, the hamming distance. On Oct 2, 2005, at 11:44 AM, Todd A. Cook wrote: > Hi, > > It may be that I don't understand your problem. :) > > Are you searching the table for the closest vector? If so, is > "closeness" defined only as the number of bits that are different? > Or, do you need to know which bits as well? > > -- todd > > > Ben wrote: > >> Hrm, I don't understand. Can you give me an example with some >> reasonably sized vectors? >> On Oct 2, 2005, at 10:59 AM, Todd A. Cook wrote: >> >>> Hi, >>> >>> Try breaking the vector into 4 bigint columns and building a >>> multi- column >>> index, with index columns going from the most evenly distributed >>> to the >>> least. Depending on the distribution of your data, you may only >>> need 2 >>> or 3 columns in the index. If you can cluster the table in that >>> order, >>> it should be really fast. (This structure is a tabular form of >>> a linked >>> trie.) >>> >>> -- todd >>> >>> >>> Ben wrote: >>> >>> >>>> Yes, that's the straightforward way to do it. But given that >>>> my vectors are 256 bits in length, and that I'm going to >>>> eventually have about 4 million of them to search through, I >>>> was hoping greater minds than mine had figured out how to do >>>> it faster, or how compute some kind of indexing....... somehow. >>>> > >
В списке pgsql-general по дате отправления: