Re: scoring differences between bitmasks

Поиск

Список

Период

Сортировка

От	Ben
Тема	Re: scoring differences between bitmasks
Дата	2 октября 2005 г. 17:57:08
Msg-id	F9149CFE-05C3-40D1-A6D8-58592A796B6E@silentmedia.com обсуждение исходный текст
Ответ на	scoring differences between bitmasks (Ben <bench@silentmedia.com>)
Список	pgsql-general

Дерево обсуждения

Just the number of bits, not which ones. Basically, the hamming
distance.

On Oct 2, 2005, at 11:44 AM, Todd A. Cook wrote:

> Hi,
>
> It may be that I don't understand your problem. :)
>
> Are you searching the table for the closest vector?  If so, is
> "closeness" defined only as the number of bits that are different?
> Or, do you need to know which bits as well?
>
> -- todd
>
>
> Ben wrote:
>
>> Hrm, I don't understand. Can you give me an example with some
>> reasonably sized vectors?
>> On Oct 2, 2005, at 10:59 AM, Todd A. Cook wrote:
>>
>>> Hi,
>>>
>>> Try breaking the vector into 4 bigint columns and building a
>>> multi- column
>>> index, with index columns going from the most evenly distributed
>>> to  the
>>> least.  Depending on the distribution of your data, you may only
>>> need 2
>>> or 3 columns in the index.  If you can cluster the table in that
>>> order,
>>> it should be really fast.  (This structure is a tabular form of
>>> a  linked
>>> trie.)
>>>
>>> -- todd
>>>
>>>
>>> Ben wrote:
>>>
>>>
>>>> Yes, that's the straightforward way to do it. But given that
>>>> my   vectors are 256 bits in length, and that I'm going to
>>>> eventually  have  about 4 million of them to search through, I
>>>> was hoping  greater minds  than mine had figured out how to do
>>>> it faster, or  how compute some  kind of indexing....... somehow.
>>>>
>
>

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: scoring differences between bitmasks