Обсуждение: Lexicographic index ?

Поиск
Список
Период
Сортировка

Lexicographic index ?

От
arnaud.mlist1@free.fr
Дата:
Hello, I've been searching since last week how I could implement what looks
basically like a dictionnary with postgres :

I have a big table (million of rows) which contains a varchar column with words
like this for example :

Table Twords:

words
-----------------
guitar
saxophon
flute

And what I want to built a query that tells me if a word given in argument can
be found at least partially in the database. Something like

select * from twords where words||'%' like 'saxophones';

works but uses a sequential scan on the table...

Is it possible to do what I want with postgres or not ? How ? :-)
Thanks for your help !

Re: Lexicographic index ?

От
"Marin Dimitrov"
Дата:
----- Original Message -----
From: <arnaud.mlist1@free.fr>


>
> And what I want to built a query that tells me if a word given in argument
can
> be found at least partially in the database. Something like
>
> select * from twords where words||'%' like 'saxophones';
>
> works but uses a sequential scan on the table...
>


try http://techdocs.postgresql.org/techdocs/fulltextindexing.php

hth,

    Marin

----
"...what you brought from your past, is of no use in your present. When
you must choose a new path, do not bring old experiences with you.
Those who strike out afresh, but who attempt to retain a little of the
old life, end up torn apart by their own memories. "



Re: Lexicographic index ?

От
arnaud.mlist1@free.fr
Дата:
> > select * from twords where words||'%' like 'saxophones';
> >
> > works but uses a sequential scan on the table...
> >

> try http://techdocs.postgresql.org/techdocs/fulltextindexing.php

Hmmm... I don't think this has any chance to work : what I need is that a part
of the work I search (beginning from its start) can be found in the table (such
as 'saxophones' would match 'saxophone' or 'sax' in the table, but
not 'saxophonesandsomethingbehind', and not that the word can be found in a
bigger word in the table (it's already easy to find 'saxophones'
from 'saxophone' using like and 'saxophone%' : it uses my index correctly.

Or didn't I understand something ?
Arnaud

Re: Lexicographic index ?

От
Jeff Eckermann
Дата:
An alternative, which may work well for you: create an
index on a substring (say, the first five or six
characters) of each word, and match against that.  To
index on a substring, you will need to create a
function, something like:

CREATE FUNCTION word_substring(text) RETURNS text AS'
SELECT substr($1, 1, 5);
'LANGUAGE SQL
WITH (iscachable);

CREATE INDEX word_substring_index ON word_table
(word_substring(word));

--- Marin Dimitrov <marin.dimitrov@sirma.bg> wrote:
>
> ----- Original Message -----
> From: <arnaud.mlist1@free.fr>
>
>
> >
> > And what I want to built a query that tells me if
> a word given in argument
> can
> > be found at least partially in the database.
> Something like
> >
> > select * from twords where words||'%' like
> 'saxophones';
> >
> > works but uses a sequential scan on the table...
> >
>
>
> try
>
http://techdocs.postgresql.org/techdocs/fulltextindexing.php
>
> hth,
>
>     Marin
>
> ----
> "...what you brought from your past, is of no use in
> your present. When
> you must choose a new path, do not bring old
> experiences with you.
> Those who strike out afresh, but who attempt to
> retain a little of the
> old life, end up torn apart by their own memories. "
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the
> unregister command
>     (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)


__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

Re: Lexicographic index ?

От
"Peter Gibbs"
Дата:
----- Original Message -----
From: <arnaud.mlist1@free.fr>
To: <pgsql-general@postgresql.org>

> select * from twords where words||'%' like 'saxophones';
>
> works but uses a sequential scan on the table...


The only method I have been able to find that will use the index is to
provide both upper and lower limits on the key.

For example:

select * from twords
  where words <= 'saxophones'
    and words >= 's'
    and position(words in 'saxophones') = 1;

This uses the index in my test, whereas it doesn't if you leave out the
second condition, even if you add an 'order by' clause.
Using position is slightly faster on my system than using likes (but I am
only using the standard /usr/dict/words for testing, so I only have 45402
rows.

--
Peter Gibbs
EmKel Systems



Re: GiST, R-TREE, Lexicographic index ?

От
arnaud.mlist1@free.fr
Дата:
> The only method I have been able to find that will use the index is to
> provide both upper and lower limits on the key.
>
> select * from twords
>   where words <= 'saxophones'
>     and words >= 's'
>     and position(words in 'saxophones') = 1;

That's an interesting progress, but it still doesn't satisfy me (still too slow
in most cases). I've been looking to R-TREEs and GiST indexes but I can't
figure how I could use them yet...

In fact, thet kind of index I would need is exactly what a GiST provides since
it references things between two limits. This allows to find fastly if boxes
overlap and things like this.
In my case, I would have to define an operator that would return true if left
operand is exactly the beginning of the right one and false otherwise, and then
find a way to use a GiST or R-TREE with that relation...
Maybe I'm far away from the solution but I don't understand yet the
implementations of GiSTs and R-TREE and how to use them for my special problem.

Do I have any chance to solve my problem using that kind of indexes ?

Thanks for your replies though !
Arnaud