Re: [HACKERS] Flexible configuration for full-text search

Поиск
Список
Период
Сортировка
От Emre Hasegeli
Тема Re: [HACKERS] Flexible configuration for full-text search
Дата
Msg-id CAE2gYzyHtn6OF5LnKptRRodWLkOvsepnN9YUgmLRpMTVuw0mzA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Flexible configuration for full-text search  (Aleksandr Parfenov <a.parfenov@postgrespro.ru>)
Ответы Re: [HACKERS] Flexible configuration for full-text search  (Aleksandr Parfenov <a.parfenov@postgrespro.ru>)
Re: [HACKERS] Flexible configuration for full-text search  (Aleksandr Parfenov <a.parfenov@postgrespro.ru>)
Список pgsql-hackers
> I'm mostly happy with mentioned modifications, but I have few questions
> to clarify some points. I will send new patch in week or two.

I am glad you liked it.  Though, I think we should get approval from
more senior community members or committers about the syntax, before
we put more effort to the code.

> But configuration:
>
> CASE english_noun WHEN MATCH THEN english_hunspell ELSE simple END
>
> is not (as I understand ELSE can be used only with KEEP).
>
> I think we should decide to allow or disallow usage of different
> dictionaries for match checking (between CASE and WHEN) and a result
> (after THEN). If answer is 'allow', maybe we should allow the
> third example too for consistency in configurations.

I think you are right.  We better allow this too.  Then the CASE syntax becomes:
   CASE config       WHEN [ NO ] MATCH THEN { KEEP | config }       [ ELSE config ]   END

> Based on formal definition it is possible to describe this example in
> following manner:
> CASE english_noun WHEN MATCH THEN english_hunspell END
>
> The question is same as in the previous example.

I couldn't understand the question.

> Currently, stopwords increment position, for example:
> SELECT to_tsvector('english','a test message');
> ---------------------
>  'messag':3 'test':2
>
> A stopword 'a' has a position 1 but it is not in the vector.

Is this problem only applies to stopwords and the whole thing we are
inventing?  Shouldn't we preserve the positions through the pipeline?

> If we want to save this behavior, we should somehow pass a stopword to
> tsvector composition function (parsetext in ts_parse.c) for counter
> increment or increment it in another way. Currently, an empty lexemes
> array is passed as a result of LexizeExec.
>
> One of possible way to do so is something like:
> CASE polish_stopword
>     WHEN MATCH THEN KEEP -- stopword counting
>     ELSE polish_isspell
> END

This would mean keeping the stopwords.  What we want is

CASE polish_stopword    -- stopword counting   WHEN NO MATCH THEN polish_isspell
END

Do you think it is possible?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] path toward faster partition pruning
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables