How to drop all tokens that a snowball dictionary cannot stem?
От | Christoph Gößmann |
---|---|
Тема | How to drop all tokens that a snowball dictionary cannot stem? |
Дата | |
Msg-id | 50A531BE-8A5D-40BA-B6AF-4B9B32FB7FF3@goessmann.io обсуждение исходный текст |
Ответы |
Re: How to drop all tokens that a snowball dictionary cannot stem?
|
Список | pgsql-general |
Hi everybody, I am trying to get all the lexemes for a text using to_tsvector(). But I want only words that english_stem -- the integratedsnowball dictionary -- is able to handle to show up in the final tsvector. Since snowball dictionaries only removestop words, but keep the words that they cannot stem, I don't see an easy option to do this. Do you have any ideas? I went ahead with creating a new configuration: -- add new configuration english_led CREATE TEXT SEARCH CONFIGURATION public.english_led (COPY = pg_catalog.english); -- dropping any words that contain numbers already in the parser ALTER TEXT SEARCH CONFIGURATION english_led DROP MAPPING FOR numword; EXAMPLE: SELECT * from to_tsvector('english_led','A test sentence with ui44 \tt somejnk words'); to_tsvector -------------------------------------------------- 'sentenc':3 'somejnk':6 'test':2 'tt':5 'word':7 In this tsvector, I would like 'somejnk' and 'tt' not to be included. Many thanks, Christoph
В списке pgsql-general по дате отправления: