Re: Tsearch2 performance on big database

Поиск

Список

Период

Сортировка

От	Rick Jansen
Тема	Re: Tsearch2 performance on big database
Дата	23 марта 2005 г. 08:52:41
Msg-id	42412E4B.4010407@rockingstone.nl обсуждение исходный текст
Ответ на	Re: Tsearch2 performance on big database (Oleg Bartunov <oleg@sai.msu.su>)
Ответы	Re: Tsearch2 performance on big database
Список	pgsql-performance

Дерево обсуждения

Oleg Bartunov wrote:
> On Tue, 22 Mar 2005, Rick Jansen wrote:
>
> Hmm, default configuration is too eager, you index every lexem using
> simple dictionary) ! Probably, it's too much. Here is what I have for my
> russian configuration in dictionary database:
>
>  default_russian | lword        | {en_ispell,en_stem}
>  default_russian | lpart_hword  | {en_ispell,en_stem}
>  default_russian | lhword       | {en_ispell,en_stem}
>  default_russian | nlword       | {ru_ispell,ru_stem}
>  default_russian | nlpart_hword | {ru_ispell,ru_stem}
>  default_russian | nlhword      | {ru_ispell,ru_stem}
>
> Notice, I index only russian and english words, no numbers, url, etc.
> You may just delete unwanted rows in pg_ts_cfgmap for your configuration,
> but I'd recommend just update them setting dict_name to NULL.
> For example, to not indexing integers:
>
> update pg_ts_cfgmap set dict_name=NULL where ts_name='default_russian'
> and tok_alias='int';
>
> voc=# select token,dict_name,tok_type,tsvector from ts_debug('Do you
> have +70000 bucks');
>  token  |      dict_name      | tok_type | tsvector
> --------+---------------------+----------+----------
>  Do     | {en_ispell,en_stem} | lword    |
>  you    | {en_ispell,en_stem} | lword    |
>  have   | {en_ispell,en_stem} | lword    |
>  +70000 |                     | int      |
>  bucks  | {en_ispell,en_stem} | lword    | 'buck'
>
> Only 'bucks' gets indexed :)
> Hmm, probably I should add this into documentation.
>
> What about word statistics (# of unique words, for example).
>

I'm now following the guide to add the ispell dictionary and I've
updated most of the rows setting dict_name to NULL:

      ts_name     |  tok_alias   | dict_name
-----------------+--------------+-----------
  default         | lword        | {en_stem}
  default         | nlword       | {simple}
  default         | word         | {simple}
  default         | part_hword   | {simple}
  default         | nlpart_hword | {simple}
  default         | lpart_hword  | {en_stem}
  default         | hword        | {simple}
  default         | lhword       | {en_stem}
  default         | nlhword      | {simple}

These are left, but I have no idea what a 'hword' or 'nlhword' or any
other of these tokens are.

Anyway, how do I find out the number of unique words or other word
statistics?

Rick
--
Systems Administrator for Rockingstone IT
http://www.rockingstone.com
http://www.megabooksearch.com - Search many book listing sites at once

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Tsearch2 performance on big database