Re: Query advice request
От | Mladen Gogala |
---|---|
Тема | Re: Query advice request |
Дата | |
Msg-id | 4CB0F5BC.5080308@vmsinfo.com обсуждение исходный текст |
Ответ на | Query advice request (Mark Kelly <pgsql@wastedtimes.net>) |
Список | pgsql-novice |
Francisco Leovey wrote: > Well, it is a good enough implementation for us, a very large legal > document database. > We are forced by government policy to only use free software. > Full phrase search can be simulated easily with or's to extract the > documents. > And we have proximity search done by the application. > I opted for Sphinx which is also free software. It has connector for PostgreSQL and works perfectly. It can index 55 million documents without a problem: Sphinx 1.10-id64-beta (r2420) Copyright (c) 2001-2010, Andrew Aksyonoff Copyright (c) 2008-2010, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/usr/local/etc/sphinx.conf'... indexing index 'mover'... collected 55616085 docs, 101706.4 MB sorted 15298.1 Mhits, 100.0% done total 55616085 docs, 101706364580 bytes total 18132.419 sec, 5609089 bytes/sec, 3067.21 docs/sec total 6705 reads, 647.896 sec, 6460.7 kb/call avg, 96.6 msec/call avg total 23240 writes, 307.031 sec, 4144.8 kb/call avg, 13.2 msec/call avg [Sat Oct 9 05:03:20.204 2010] [32745] using config file '/usr/local/etc/sphinx.conf'... [Sat Oct 9 05:03:20.230 2010] [32745] listening on all interfaces, port=9312 Sphinx 1.10-id64-beta (r2420) This index contains 55 million documents and takes 100GB of space. It's not integrated with Postgres the way Tsearch2 is, but it has PHP and Perl API's and all the capabilities I need. The biggest problem was not with the index but with the database and optimizing the SQL accessing a huge partitioned table. To that end, I even upgraded to Postgres 9.0, which can resolve conditions like 'IS NOT NULL' by using indexes. The 3rd software I mentioned is Lucene, produced by the Apache foundation, with the price equal to the price of Sphinx and Lucene. Here is my table: psql (9.0.1) Type "help" for help. news=# set search_path=moreover; SET news=# select count(*) from moreover_documents; count ---------- 55892429 (1 row) There is a text column, which is indexed, and I need all the usual search options: phrases, proximity, quorum and alike. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
В списке pgsql-novice по дате отправления: