Re: Very bad FTS performance with the Polish config
От | Tom Lane |
---|---|
Тема | Re: Very bad FTS performance with the Polish config |
Дата | |
Msg-id | 15251.1258645873@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Very bad FTS performance with the Polish config (Wojciech Knapik <webmaster@wolniartysci.pl>) |
Ответы |
Re: Very bad FTS performance with the Polish config
|
Список | pgsql-hackers |
Wojciech Knapik <webmaster@wolniartysci.pl> writes: > Tom Lane wrote: >> I tried to duplicate this test, but got no further than here: >> ERROR: syntax error >> CONTEXT: line 174 of configuration file "/home/tgl/testversion/share/postgresql/tsearch_data/polish.affix": " L E C > -C,G�EM #zalec (15a) > Here are the files I used (polish.affix, polish.dict already generated): > http://wolniartysci.pl/pl.tar.gz Your files were the same as mine. I eventually figured out the problem was I was using C locale, in which some of those letters aren't letters. (I wonder whether the tsearch config file parsers could be made less sensitive to this by avoiding t_isalpha tests.) In pl_PL.ut8 locale I could see that the example is indeed much slower. Oleg is right that the fundamental difference is that this Polish configuration is using an ispell dictionary where the simple English configuration is not. But, just for the record, here's what an oprofile profile looks like: samples % image name symbol name 7480 20.9477 postgres RS_execute 5370 15.0386 postgres pg_utf_mblen 4138 11.5884 postgres pg_mblen 3756 10.5187 postgres mb_strchr 2880 8.0654 postgres FindWord 2754 7.7126 postgres CheckAffix 1576 4.4136 postgres NormalizeSubWord 966 2.7053 postgres FindAffixes 896 2.5092 postgres TParserGet 742 2.0780 postgres AllocSetAlloc 420 1.1762 postgres AllocSetFree 396 1.1090 postgres addHLParsedLex 384 1.0754 postgres LexizeExec So about 55% of the time is going into affix pattern matching. I wonder whether that couldn't be made faster. A lot of the cycles are spent on coping with variable-length characters --- perhaps the ispell code should convert to wchar representation before doing this? regards, tom lane
В списке pgsql-hackers по дате отправления: