Sphinx indexing problem
От | Mladen Gogala |
---|---|
Тема | Sphinx indexing problem |
Дата | |
Msg-id | 4BF9ADF2.7030200@vmsinfo.com обсуждение исходный текст |
Ответы |
Re: Sphinx indexing problem
(Joshua Tolley <eggyknap@gmail.com>)
|
Список | pgsql-novice |
I am trying to create a Sphinx index on a fairly large Postgres table. My problem is the fact that the Postgres API is trying to put the entire result set into the memory: [root@medo etc]# ../bin/indexer --all Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file '/usr/local/etc/sphinx.conf'... indexing index 'test1'... E<font color="#ff0000">RROR: index 'test1': sql_query: out of memory for query result (DSN=pgsql://news:***@medo:5432/news). total 0 docs, 0 bytes total 712.593 sec, 0 bytes/sec, 0.00 docs/sec total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg total 0 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg Corresponding log entries on the Postgres side are: STATEMENT: SELECT segment_id,air_date,start_time,end_time,source_type, market_name,station_name,program_name, content_text FROM news_segments LOG: unexpected EOF on client connection LOG: unexpected EOF on client connection LOG: unexpected EOF on client connection The Postgres message isn't exactly helpful, but given the circumstances, it can't be more helpful. The problem is on the client side. The table I am using is pretty large and has 14.3 million rows: news=> select count(*) from news_segments; count ---------- 14366286 (1 row) Time: 233759.639 ms Is there anything I can do to prevent the API from attempting to put the entire query result in memory? I can partition the table, create separate indexes and merge them, but that is a large unnecessary maintenance. I also suspect that the other queries with a large result set will start to fail. I temporarily solved my problem by using "range query" option offered by sphinx: sql_query_range = \ SELECT min(segment_id),max(segment_id) FROM news_segments sql_range_step=10000 sql_query =\ SELECT segment_id,air_date,start_time,end_time,source_type, \ market_name,station_name,program_name, segment_text \ FROMnews_segments \ WHERE segment_id>=$start and segment_id<$end Segment_id is a numeric field and the query will be executed many times, which is less than optimal. It does make the thing work, though. Would it be possible to set maximum memory for the query result caching from the API itself? How can I increase the maximum memory size used by the client API? -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
В списке pgsql-novice по дате отправления: