Re: [GENERAL] Text Databases
От | The Hermit Hacker |
---|---|
Тема | Re: [GENERAL] Text Databases |
Дата | |
Msg-id | Pine.BSF.4.05.9812030133590.4737-100000@thelab.hub.org обсуждение исходный текст |
Ответ на | Text Databases (Nicholas Humfrey <nhumfrey@poboxes.com>) |
Список | pgsql-general |
On Wed, 2 Dec 1998, Nicholas Humfrey wrote: > Hi, > > Here is a very general question: > > How good is PostgreSQL at hosting nearly pure text based databases ? Is it > possible to search mutilple cols for the same keyword ? How well would it > handle 40,000+ rows with say 15 columns each with about 20 words in each > cell ? See the 'fulltextindex' directory under the contrib directory. Part of the README follows... Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org An attempt at some sort of Full Text Indexing for PostgreSQL. The included software is an attempt to add some sort of Full Text Indexing support to PostgreSQL. I mean by this that we can ask questions like: Give me all rows that have 'still' and 'nash' in the 'artist' field. Ofcourse we can write this as: select * from cds where artist ~* 'stills' and artist ~* 'nash'; But this does not use any indices, and therefore, if your database gets very large, it will not have very high performance (the above query requires at least one sequential scan, it probably takes 2 due to the self-join). The approach used by this add-on is to define a trigger on the table and column you want to do this queries on. On every insert in the table, it takes the value in the specified column, breaks the text in this column up into pieces, and stores all sub-strings into another table, together with a reference to the row in the original table that contained this sub-string (it uses the oid of that row). By now creating an index over the 'fti-table', we can search for substrings that occur in the original table. By making a join between the fti-table and the orig-table, we can get the actual rows we want (this can also be done by using subselects, and maybe there're other ways too).
В списке pgsql-general по дате отправления: