Re: Searching for substring with tsearch(1/2)
От | Teodor Sigaev |
---|---|
Тема | Re: Searching for substring with tsearch(1/2) |
Дата | |
Msg-id | 3FD6E54E.7080109@sigaev.ru обсуждение исходный текст |
Ответ на | Re: Searching for substring with tsearch(1/2) (Hannu Krosing <hannu@tm.ee>) |
Ответы |
Re: Searching for substring with tsearch(1/2)
|
Список | pgsql-hackers |
>>Tsearch was never minded as prefix search, and index structure doesn't support >>any kind of prefix or suffix. But you can write extension to tsearch, which will >>search by prefix. But such solution wiil not use index, only sequence scan. > > > How efficient would tsearch be for really big expressions (where 'hu%' > would be expanded (using a btree word index on one column word table) to > tsearch equivalent of ( "human" or "humanity" or "humming" or "huge" or > ..1000 words here...) before passing the expression to tsearch? GiST index of tsearch doen't support prefix search, so it will works only by seqscan, as we know :) disk is much more slow than processor, speed will be limited by disk. >>Prefix searches easy realized with inverted index, but it require a lot of >>programing. >>The simplest way is: >>create table invidx ( >> lexeme text not null primary key, >> ids[] int >>); >> >>where ids[] - array with identificators of documents which contains this word. > > > How hard (or sensible ;) would be creating such an index using GiST ? > As proved by tsearch GiST can cope well with many-to-many indexes. Sorry, I don't understand. Do you mean that GiST supports one heap tuple in several index tuple? If yes then no :). GiST doesn't support this feature. I don't think that GiST may help in this situation. > create table invidx ( > lexeme text not null, > textdate date not null, > ids[] int, > primary key (lexeme, textdate) > ); > > which would partition the invidx table on textdate (or some other > suitable datum) > > >>2 If word is frequent then query with 'IN (select * from func()) may works slow... > if it is often too slow then creating a temp table and doing a plain > join may be faster. Table structure as indidx decrease this problem. -- Teodor Sigaev E-mail: teodor@sigaev.ru
В списке pgsql-hackers по дате отправления: