Re: Working with huge amount of data. RESULTS!
От | Oleg Bartunov |
---|---|
Тема | Re: Working with huge amount of data. RESULTS! |
Дата | |
Msg-id | Pine.LNX.4.64.0802121809410.23796@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | Re: Working with huge amount of data. RESULTS! (Mario Lopez <mario@lar3d.com>) |
Список | pgsql-general |
On Tue, 12 Feb 2008, Mario Lopez wrote: > Hi!, > > I optimized the LIKE 'keyword%' and LIKE '%keyword' with the following > results: > > # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from > table1 where varchar_reverse(data) like varchar_reverse('%keyword');" > real 0m0.055s > user 0m0.011s > sys 0m0.006s > > > # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from > table1 where data like 'keyword%';" > real 0m0.026s > user 0m0.012s > sys 0m0.006s > > It works flawlesly as you can see by the timings, take in consideration that > "table1" has 100 million records. The only problem is generating the reversed > index which takes like 20 minutes, I guess it has to do with the plperl > function, perhaps a C function for inverting would make it up in less time. > > The problem is still with the LIKE '%keyword%', my problem is that I am not > searching for Words in a dictionary fashion, suppose my "data" is random > garbage, that it has common consecutive bytes. How could I generate a > dictionary from this random garbage to make it easier for indexing? suffix tree (array) would speedup '%keyword%' query, but currently it doesn't supported by GiST extension architecture (we have it in our TODO), so I see several ways (not tested): 1. try contrib/pg_trgm to reduce a number of candidate words 2. generate all possible substrings and use your tested approach > >> On Mon, Feb 11, 2008 at 04:37:24PM +0100, Mario Lopez wrote: >> >>> SELECT * FROM names WHERE name LIKE keyword% >>> Or >>> SELECT * FROM names WHERE name LIKE %keyword% >>> >> >> check this: >> http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/ >> and this: >> http://www.depesz.com/index.php/2007/09/15/speeding-up-like-xxx/ >> >> depesz >> >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-general по дате отправления: