Обсуждение: Btree index extension question

Поиск
Список
Период
Сортировка

Btree index extension question

От
Dmitry Tkach
Дата:
Hi, everybody!

I was wonderring if there is somebody out there who could help me with
understand how index extensions work...
Let me state the problem first.

I have many (15) boolean attributes and I need to be able to search the
database for entries with any combination of those attributes for being
true. For example - find all the entries, where a1=a2=a3=true or find
all the entries where a1=a2=a4=true etc...
Because there are so many of them (and the database is HUGE), putting
every attribute into a separate column and creating a separate index on
every possible combination, is really out of the question.
So, I was thinking about creating a single int2 column, with each bit
representing an attribute - so that, the first query I quoted above
would look like "select * from table where attributes & 7 = 7", and the
other query would be
"select * from table where attributes & 11 = 11' etc...

This looked so beautiful to me, but now I am stuck trying to index that
table  [:-(]

I started off, hoping to get away with btrees.

I defined an operator >>=(int2,int2) as 'select $1&$2=$2;'
It looks nice so far, but then the question is - what do I do with the
other operations? By analogy with 'normal' comparison operators, I would do:

 >> (I know the name is taken  [:-)]  as 'select not $2 >>= $1'
=<<                                  as 'select $2 >>= $1'
<<                                   as 'select not $1 >>= $2'
... and leave '=' intact.

But then I realized, that these set of operators, does not really define
a complete order - for example, if I compare, say, 5 and 3:
5 & 3 = 1, 3 & 5 = 1, so I get BOTH 5 << 3 and 5 >> 3 being true at the
same time  [:-(]

So my question is, first of all, is that a problem? Does btree require a
complete order defined? Will it work with partial order?
Secondly, if it is a problem, perhaps, I am missing something here,
assuming that there is no way to define a set of operations to do what I
want and provide a completely ordered set (or do I need it to define a
perfect complete order - what exactly is required for btree to work? Any
ideas?)

And finally, if there is just no way I could get away with btrees, can I
make an rtree to work for me? Could somebody explain to me (or point me
to a doc somewhere) the meaning of the strategies (and requirements -
like transitivity etc...) I need for an rtree, and also what support
functions (like comparison func in case of a btree) do I need?

Thank you very much for your attention.
Any input will be greatly appreciated.

Dima


Re: Btree index extension question

От
Дата:
Hi,

Don't know enough about postgresql to be sure about this, but here goes:

If postgresql does bitwise operations, then you can use that instead of
defining new operators. Just construct a number for all the columns that
need to be true and do a bitwise 'and' with the stored value. (eg. (7 &
stored_val) = 7)

If postgresql uses an index to supply functions with their parameters,
then make a function that'll do the comparison for you and use it in your
query.  Or make the index (on all the columns) and make a function that
takes all the columns as the parameters to compare against (and ofcourse
the values that you want to check against). That way you always use the
columns of the index in the correct order.

Somebody please check this, as I may have been hit with a stupid-stick.

On Fri, 15 Mar 2002, Dmitry Tkach wrote:

> Hi, everybody!
>
> I was wonderring if there is somebody out there who could help me with
> understand how index extensions work...
> Let me state the problem first.
>
> I have many (15) boolean attributes and I need to be able to search the
> database for entries with any combination of those attributes for being
> true. For example - find all the entries, where a1=a2=a3=true or find
> all the entries where a1=a2=a4=true etc...
> Because there are so many of them (and the database is HUGE), putting
> every attribute into a separate column and creating a separate index on
> every possible combination, is really out of the question.
> So, I was thinking about creating a single int2 column, with each bit
> representing an attribute - so that, the first query I quoted above
> would look like "select * from table where attributes & 7 = 7", and the
> other query would be
> "select * from table where attributes & 11 = 11' etc...
>
> This looked so beautiful to me, but now I am stuck trying to index that
> table  [:-(]
>
> I started off, hoping to get away with btrees.
>
> I defined an operator >>=(int2,int2) as 'select $1&$2=$2;'
> It looks nice so far, but then the question is - what do I do with the
> other operations? By analogy with 'normal' comparison operators, I would do:
>
>  >> (I know the name is taken  [:-)]  as 'select not $2 >>= $1'
> =<<                                  as 'select $2 >>= $1'
> <<                                   as 'select not $1 >>= $2'
> .. and leave '=' intact.
>
> But then I realized, that these set of operators, does not really define
> a complete order - for example, if I compare, say, 5 and 3:
> 5 & 3 = 1, 3 & 5 = 1, so I get BOTH 5 << 3 and 5 >> 3 being true at the
> same time  [:-(]
>
> So my question is, first of all, is that a problem? Does btree require a
> complete order defined? Will it work with partial order?
> Secondly, if it is a problem, perhaps, I am missing something here,
> assuming that there is no way to define a set of operations to do what I
> want and provide a completely ordered set (or do I need it to define a
> perfect complete order - what exactly is required for btree to work? Any
> ideas?)
>
> And finally, if there is just no way I could get away with btrees, can I
> make an rtree to work for me? Could somebody explain to me (or point me
> to a doc somewhere) the meaning of the strategies (and requirements -
> like transitivity etc...) I need for an rtree, and also what support
> functions (like comparison func in case of a btree) do I need?
>
> Thank you very much for your attention.
> Any input will be greatly appreciated.
>
> Dima
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>




Re: Btree index extension question

От
Dmitry Tkach
Дата:
fcanedo@hotpop.com wrote:

>
>If postgresql does bitwise operations, then you can use that instead of
>defining new operators. Just construct a number for all the columns that
>need to be true and do a bitwise 'and' with the stored value. (eg. (7 &
>stored_val) = 7)
>

Yeah... The thing is that I want to be able to the index. And to use the
index, I need BOOLEAN
operators (this seems to be the LEAST of my problems,but anyway) - so, I
have to define 'wrappers' around the standard bitwise operations - e.g.
a <<= b   ---> a & b = a;

>
>
>If postgresql uses an index to supply functions with their parameters,
>then make a function that'll do the comparison for you and use it in your
>query.
>

Well ... that's the point - can't do that :-(
You can create functional indexes in postgres (and anywhere else AFAIK),
but the function must take a SINGLE parameter.
In other words, the only way to do what I need would be to create 15
functions, like:
check_bit_1 (x) return x & 1 = 1;
check_bit_2 (x) return x & 2 = 2;
etc...
And then create 15 different indexes (one for each func).

But even that would not be of much help, because I need to search by a
COMBINATION of
parameters, and need a COMPOUND index to do that, not a separate index
for each attr...

> Or make the index (on all the columns) and make a function that
>takes all the columns as the parameters to compare against (and ofcourse
>the values that you want to check against). That way you always use the
>columns of the index in the correct order.
>

I am not sure I understand this suggestion... If I make the index on all
the columns, I would need to specify all the (leftmost) values in the
search criteria to be able to use it, right?
For example, suppose, I have an index on (a,b,c) - then
select * from foo where a=bar and b=bar will work, but
select * from foo where b=bar and c=bar will not...

That's exactly my problem - I need to be able to search by any
combination of the values - (a),(b),(c),(ab),(ac),(bc),(abc)
... only I have 15 of them - too many combinations to consider buidling
indexes for any of them :-(



Re: Btree index extension question

От
Дата:

On Fri, 15 Mar 2002, Dmitry Tkach wrote:

> fcanedo@hotpop.com wrote:
>
> >
> >If postgresql does bitwise operations, then you can use that instead of
> >defining new operators. Just construct a number for all the columns that
> >need to be true and do a bitwise 'and' with the stored value. (eg. (7 &
> >stored_val) = 7)
> >
>
> Yeah... The thing is that I want to be able to the index. And to use the
> index, I need BOOLEAN
> operators (this seems to be the LEAST of my problems,but anyway) - so, I
> have to define 'wrappers' around the standard bitwise operations - e.g.
> a <<= b   ---> a & b = a;


Ok, let's see whether we understand each other:

1. Make a column that contains the bitstring of your 15 boolean
columns. Let's call it bitstring.
2. bitstring is calculated on each insert or update by a trigger.
3. Make an index on bitstring.
4. Make a query to find the records that have a1, a2 and a3 set to true,
like so:

SELECT * FROM table where (bitstring & 7) = 7;

Will this not give you the correct answer and use the index on bitstring?




Re: Btree index extension question

От
Dmitry Tkach
Дата:
>
>
>
>SELECT * FROM table where (bitstring & 7) = 7;
>
>Will this not give you the correct answer and use the index on bitstring?
>
Yes, and know... (Yes, it will give th ecorrect answer, and NO it will
not use the index).
The thing is, that it will only use a (btree) index for one of the
'comparison' operators
(<=, <, =, >, >=), that compare the value in the table with the query
parameter - so,
bitstring = 7, or bitstring < 7 etc... will work, but
'do_something_to_bitstring = 7' will not (I mean, it will, but it won't
use the index).

So, the idea is to redefine those comparison operators for the index, so
that I could use one of them fopr my purpose - for example (as I
mentioned in the original message), I was trying to define a '<<='
("less or equals") operator as a <<= b iff a & b = a.
My problem is that (as far as I understand) the way I am doing it will
not work (my set of operators doesn't define a strict ordering, so that
not any pair of values is comparable) :-(

And what I am looking for is either a suggestion of how this strict
ordering can be defined, or a description of what exactly the btree
needs to work (I suspect, that completely strict ordering is not really
necesssary, and that I could get away with some kind of a partial order,
if I new exactly what the btree stuff is looking for)...

Dima





Re: Btree index extension question

От
Дата:

On Fri, 15 Mar 2002, Dmitry Tkach wrote:

...
> Yes, and know... (Yes, it will give th ecorrect answer, and NO it will
> not use the index).
> The thing is, that it will only use a (btree) index for one of the
> 'comparison' operators
> (<=, <, =, >, >=), that compare the value in the table with the query
> parameter - so,
> bitstring = 7, or bitstring < 7 etc... will work, but
> 'do_something_to_bitstring = 7' will not (I mean, it will, but it won't
> use the index).


Okay, I'm recovering from my encounter with the stupid stick.

I understand this:
1. You want to use a btree index because presumably it's faster than a
normal index.
2. A btree index is a binary tree index that uses the order of values to
find an answer quickly.
3. In your case for instance: a value of 10 should produce a resultset
with bitstrings of 10, 11, 14, 26, ...

So yeah, since 12 and 13 are > 10, normal equality operators won't work.

If a solution were to be found, I think it should base the ordering on
the bits from least significant bit to most significant. So, for 1st bit
go left in the tree for a 0 and right for a 1 and same for next bit.
But no, this won't work since you don't care about the other boolean
columns, solong as the ones your looking for are true! To clarify, if you
don't care about the first column/bit, values from both the left and the
right parts of the binary tree can be valid.

Wild guess: base the ordering on the number of bits and then try to
narrow it down in the select with an equal operator.

SELECT * FROM TABLE
WHERE bitstring > 7
AND bitstring = 7;

Wouldn't know how to get this to work though! :(


Oracle has a special index type for just this case. I think they call
them bitmap indexes.





Re: Btree index extension question

От
Dmitry Tkach
Дата:
fcanedo@hotpop.com wrote:

>
>
>I understand this:
>1. You want to use a btree index because presumably it's faster than a
>normal index.
>
Ahhh.. What do you mean by 'normal index'?
I have two reasons to use btree:
- It is simpler to implement;
- It can be used in combination with other columns (these 15 boolean
flags is not the only stuff I need to search by, so, I am going to have
to run queries like '... where flags & 33 = 33 and foo=bar'' - to do
that, I would need an index on (flags 'bit_ops', foo), where bit_ops is
my special set of operations I am looking to define). If I am not using
btree for the flags, I would then have to define the whole indexing
strategy for all my columns, and I am really hoping to be able to avoid
having to do that.

>
>2. A btree index is a binary tree index that uses the order of values to
>find an answer quickly.
>
Yep.

>
>3. In your case for instance: a value of 10 should produce a resultset
>with bitstrings of 10, 11, 14, 26, ...
>
Yep.

>
>Wouldn't know how to get this to work though! :(
>

Yeah... me neither :-(
I am about to give up on this (hate to do that!)...

I was trying to make use of a GiST index now, but having some problems
with that too :-(
Could you please look at the next message I am about to post in a few
minutes (about debugging C functions)? Perhaps, you would have some
helpful ideas there?

Thanks!


Dima