Re: gsoc, oprrest function for text search take 2
От | Jan Urbański |
---|---|
Тема | Re: gsoc, oprrest function for text search take 2 |
Дата | |
Msg-id | 48A410B7.3020004@students.mimuw.edu.pl обсуждение исходный текст |
Ответ на | Re: gsoc, oprrest function for text search take 2 ("Heikki Linnakangas" <heikki@enterprisedb.com>) |
Ответы |
Re: gsoc, oprrest function for text search take 2
|
Список | pgsql-hackers |
Heikki Linnakangas wrote: > Jan Urbański wrote: >> So right now the idea is to: >> (1) pre-sort STATISTIC_KIND_MCELEM values >> (2) build an array of pointers to detoasted values in tssel() >> (3) use binary search when looking for MCELEMs during tsquery analysis > > Sounds like a plan. In (2), it's even better to detoast the values > lazily. For a typical one-word tsquery, the binary search will only look > at a small portion of the elements. Hm, how can I do that? Toast is still a bit black magic to me... Do you mean I should stick to having Datums in TextFreq? And use DatumGetTextP in bsearch() (assuming I'll get rid of qsort())? I wanted to avoid that, so I won't detoast the same value multiple times, but it's true: a binary search won't touch most elements. > Another thing is, how significant is the time spent in tssel() anyway, > compared to actually running the query? You ran pgbench on EXPLAIN, > which is good to see where in tssel() the time is spent, but if the time > spent in tssel() is say 1% of the total execution time, there's no point > optimizing it further. Changed to the pgbench script to select * from manual where tsvector @@ to_tsquery('foo'); and the parameters to pgbench -n -f tssel-bench.sql -t 1000 postgres and got number of clients: 1 number of transactions per client: 1000 number of transactions actually processed: 1000/1000 tps = 12.238282 (including connections establishing) tps = 12.238606 (excluding connections establishing) samples % symbol name 174731 31.6200 pglz_decompress 88105 15.9438 tsvectorout 17280 3.1271 pg_mblen 13623 2.4653 AllocSetAlloc 13059 2.3632 hash_search_with_hash_value 10845 1.9626 pg_utf_mblen 10335 1.8703 internal_text_pattern_compare 9196 1.6641 index_getnext 9102 1.6471 bttext_pattern_cmp 8075 1.4613 pg_detoast_datum_packed 7437 1.3458 LWLockAcquire 7066 1.2787 hash_any 6811 1.2325 AllocSetFree 6623 1.1985 pg_qsort 6439 1.1652 LWLockRelease 5793 1.0483 DirectFunctionCall2 5322 0.9631 _bt_compare 4664 0.8440 tsCompareString 4636 0.8389 .plt 4539 0.8214 compare_two_textfreqs But I think I'll go with pre-sorting anyway, it feels cleaner and neater. -- Jan Urbanski GPG key ID: E583D7D2 ouden estin
В списке pgsql-hackers по дате отправления: