Re: advice on indexing email
От | Maarten Boekhold |
---|---|
Тема | Re: advice on indexing email |
Дата | |
Msg-id | 39097E62.9FB27E81@tibcofinance.com обсуждение исходный текст |
Ответ на | advice on indexing email (Marc Tardif <intmktg@CAM.ORG>) |
Ответы |
Re: advice on indexing email
|
Список | pgsql-general |
Hi, I wrote that fti stuff in contrib... > My problem is how to create the full word index. The actual code to > seperate the email into seperate words isn't a problem, but should I be > using INSERT, BEGIN/END or COPY? In this last case, I would have to create > a temporary file holding each word of the email and then use COPY... all > of which also has it's fair share of overhead. You can use one of 2 ways. 1. the fti stuff in contrib uses triggers, so every time you insert/update/delete something in/from the 'fti-ed' table, the full text index is also updated. If you're coding abilities are OK, you can just replace the word breakup code in contrib/fti with your own one. 2. if you have to insert large amounts of data, it is probably faster to *not* create the triggers at first, bulk load all your data, write a little perl script that reads the data from your table, does the word breakup and inserts those words into the full text index table. Using a 'sort' on the output of the perl script will help performance as the fti data will now already be pre-sorted in the database (you could also use CLUSTER on the fti table after the index has been created). I think I described this somewhat better in the README in contrib/fti. If you take this approach, don't forget to create the triggers after the bulk load of the fti table! Maarten -- Maarten Boekhold, maarten.boekhold@tibcofinance.com TIBCO Finance Technology Inc. "Sevilla" Building Entrada 308 1096 ED Amsterdam, The Netherlands tel: +31 20 6601000 (direct: +31 20 6601066) fax: +31 20 6601005 http://www.tibcofinance.com
В списке pgsql-general по дате отправления: