Re: Feature: Add Greek language fulltext search
| От | Panagiotis Mavrogiorgos |
|---|---|
| Тема | Re: Feature: Add Greek language fulltext search |
| Дата | |
| Msg-id | CAAVvtwrnGCoiG5csey14=mrn_jTUEO2R2TzUWR2+TuezA3wR3A@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Feature: Add Greek language fulltext search (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) |
| Список | pgsql-hackers |
On Thu, Jul 4, 2019 at 1:39 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:
> Last November snowball added support for Greek language [1]. Following
> the instructions [2], I wrote a patch that adds fulltext search for
> Greek in Postgres. The patch is attached.
I have committed a full sync from the upstream snowball repository,
which pulled in the new greek stemmer.
Could you please clarify where you got the stopword list from? The
README says those need to be downloaded separately, but I wasn't able to
find the download location. It would be good to document this, for
example in the commit message. I haven't committed the stopword list yet.
Thank you Peter,
Here is the repo with the stop-words: https://github.com/pmav99/greek_stopwords
The list is based on an earlier publication with modification by me. All the relevant info is on github.
Disclaimer 1: The list has not been validated by an expert.
Disclaimer 1: The list has not been validated by an expert.
Disclaimer 2: There are more stop-words lists on the internet, but they are less complete and they also use ancient greek words. Furthermore, my testing showed that snowball needs to handle accents (tonous) and ς (teliko sigma) in a special way if you want the stemmer to work with capitalized words too.
https://github.com/Xangis/extra-stopwords/blob/master/greek
https://github.com/stopwords-iso/stopwords-el/tree/master/raw
https://github.com/stopwords-iso/stopwords-el/tree/master/raw
all the best,
Panagiotis
Panagiotis
В списке pgsql-hackers по дате отправления: