Re: Tsearch2 performance on big database
От | Rick Jansen |
---|---|
Тема | Re: Tsearch2 performance on big database |
Дата | |
Msg-id | 42412E4B.4010407@rockingstone.nl обсуждение исходный текст |
Ответ на | Re: Tsearch2 performance on big database (Oleg Bartunov <oleg@sai.msu.su>) |
Ответы |
Re: Tsearch2 performance on big database
|
Список | pgsql-performance |
Oleg Bartunov wrote: > On Tue, 22 Mar 2005, Rick Jansen wrote: > > Hmm, default configuration is too eager, you index every lexem using > simple dictionary) ! Probably, it's too much. Here is what I have for my > russian configuration in dictionary database: > > default_russian | lword | {en_ispell,en_stem} > default_russian | lpart_hword | {en_ispell,en_stem} > default_russian | lhword | {en_ispell,en_stem} > default_russian | nlword | {ru_ispell,ru_stem} > default_russian | nlpart_hword | {ru_ispell,ru_stem} > default_russian | nlhword | {ru_ispell,ru_stem} > > Notice, I index only russian and english words, no numbers, url, etc. > You may just delete unwanted rows in pg_ts_cfgmap for your configuration, > but I'd recommend just update them setting dict_name to NULL. > For example, to not indexing integers: > > update pg_ts_cfgmap set dict_name=NULL where ts_name='default_russian' > and tok_alias='int'; > > voc=# select token,dict_name,tok_type,tsvector from ts_debug('Do you > have +70000 bucks'); > token | dict_name | tok_type | tsvector > --------+---------------------+----------+---------- > Do | {en_ispell,en_stem} | lword | > you | {en_ispell,en_stem} | lword | > have | {en_ispell,en_stem} | lword | > +70000 | | int | > bucks | {en_ispell,en_stem} | lword | 'buck' > > Only 'bucks' gets indexed :) > Hmm, probably I should add this into documentation. > > What about word statistics (# of unique words, for example). > I'm now following the guide to add the ispell dictionary and I've updated most of the rows setting dict_name to NULL: ts_name | tok_alias | dict_name -----------------+--------------+----------- default | lword | {en_stem} default | nlword | {simple} default | word | {simple} default | part_hword | {simple} default | nlpart_hword | {simple} default | lpart_hword | {en_stem} default | hword | {simple} default | lhword | {en_stem} default | nlhword | {simple} These are left, but I have no idea what a 'hword' or 'nlhword' or any other of these tokens are. Anyway, how do I find out the number of unique words or other word statistics? Rick -- Systems Administrator for Rockingstone IT http://www.rockingstone.com http://www.megabooksearch.com - Search many book listing sites at once
В списке pgsql-performance по дате отправления: