several questions about R-tree index
От | TJ O'Donnell |
---|---|
Тема | several questions about R-tree index |
Дата | |
Msg-id | 3366.209.223.166.104.1114559525.squirrel@gnova.com обсуждение исходный текст |
Ответы |
Re: several questions about R-tree index
|
Список | pgsql-sql |
According to the manual at: http://www.postgresql.org/docs/7.4/static/functions-geometry.html " The PostgreSQL query planner will consider using an R-tree index whenever an indexed column is involved in a comparison using one of these operators: <<, &<, &>, >>, @, ~=, && (Refer to Section 9.9 about the meaning of these operators.)" Shouldn't the ~ (contains) operator be included also? Isn't ~ the commutator of @ ? I am considering using R-tree's for other-than-geometric purposes. Here is the story: I have string data (column smiles) which represents chemical structures. Comparing strings for equality works perfectly using an ordinary B-tree index. But locating chemical substructures requires a more elaborate (read time-consuming) match function. I search for substructures like this: Select count(smiles) from structures where matches(smiles, subsmiles); where subsmiles is the search pattern desired. This is not a fast search - about 15 seconds for 0.25 million rows. To speed up this search I have a function (called fingerprint) which produces a bit string representation of some common and uncommon substructures. I have populated my table with column fp = fingerprint(smiles) So, this is about 5 times faster: Select count(smiles) from structures where (fp & fingerprint(subsmiles)) = fingerprint(subsmiles) & matches(smiles, subsmiles); The sequence scan using the first where-clause takes about 3 seconds and the final matches on the resulting subset is insignificant when the subset is small. But I think I might be able to do better (faster) using R-trees. Bitstrings can be thought of as "containing" when one bitstring has all the same bits set as another, even if it has other bits set too - this is the gist of the first where-clause above. Can I expect to be able to use R-tree's to do this? Will I simply have to define a new datatype and three R-tree functions (union, intersection and size). Will the use of the ~ (contains) operator cause the planner to consider using an index (this is my first question, way above)? I hope someone has had the patience and interest to read this far. Thanks, TJ
В списке pgsql-sql по дате отправления: