Re: Fastest Index/Algorithm to find similar sentences
От | Merlin Moncure |
---|---|
Тема | Re: Fastest Index/Algorithm to find similar sentences |
Дата | |
Msg-id | CAHyXU0zKSRpFVTd3x9uKNf-nK-Dr96+Ot=7_0TiR47_-q0oTRg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Fastest Index/Algorithm to find similar sentences (Kevin Grittner <kgrittn@ymail.com>) |
Список | pgsql-general |
On Fri, Aug 2, 2013 at 10:25 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > Janek Sendrowski <janek12@web.de> wrote: > >> I also tried pg_trgm module, which works with tri-grams, but it's >> also very slow with 100.000+ rows. > > Hmm. I found the pg_trgm module very fast for name searches with > millions of rows *as long as I used KNN-GiST techniques*. Were you > careful to do so? Check out the "Index Support" section of this > page: > > http://www.postgresql.org/docs/current/static/pgtrgm.html > > While I have not tested this technique with a column containing > sentences, I would expect it to work well. As a quick > confirmation, I imported the text form of War and Peace into a > table, with one row per *line* (because that was easier than > parsing sentence boundaries for a quick test). That was over > 65,000 rows. + 1 this. pg_trgm is black magic. search time (when using index) is mostly dependent on number of trigrams in search string vs average number of trigrams in database. merlin
В списке pgsql-general по дате отправления: