Re: gsoc, oprrest function for text search take 2
От | Jan Urbański |
---|---|
Тема | Re: gsoc, oprrest function for text search take 2 |
Дата | |
Msg-id | 48A40095.2080408@students.mimuw.edu.pl обсуждение исходный текст |
Ответ на | Re: gsoc, oprrest function for text search take 2 ("Heikki Linnakangas" <heikki@enterprisedb.com>) |
Ответы |
Re: gsoc, oprrest function for text search take 2
Re: gsoc, oprrest function for text search take 2 |
Список | pgsql-hackers |
Heikki Linnakangas wrote: > Jan Urbański wrote: >> Not good... Shall I try sorting pg_statistics arrays on text values >> instead of frequencies? > > Yeah, I'd go with that. If you only do it for the new > STATISTIC_KIND_MCV_ELEMENT statistics, you shouldn't need to change any > other code. OK, will do. >> BTW: I just noticed some text_to_cstring calls, they came from >> elog(DEBUG1)s that I have in my code. But they couldn't have skewn the >> results much, could they? > > Well, text_to_cstring was consuming 1.1% of the CPU time on its own, and > presumably some of the AllocSetAlloc overhead is attributable to that as > well. And perhaps some of the detoasting as well. > > Speaking of which, a lot of time seems to be spent on detoasting. I'd > like to understand that a better. Where is the detoasting coming from? Hmm, maybe bttext_pattern_cmp does some detoasting? It calls PG_GETARG_TEXT_PP(), which in turn calls pg_detoast_datum_packed(). Oh, and also I think that compare_lexeme_textfreq() uses DatumGetTextP() and that also does detoasting. The root of all evil could by keeping a Datum in the TextFreq array, and not a "text *", which is something you pointed out earlier and I apparently didn't understand. So right now the idea is to: (1) pre-sort STATISTIC_KIND_MCELEM values (2) build an array of pointers to detoasted valuesin tssel() (3) use binary search when looking for MCELEMs during tsquery analysis Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin
В списке pgsql-hackers по дате отправления: