Re: General guidance: Levenshtein distance versus other similarity algorithms
От | Merlin Moncure |
---|---|
Тема | Re: General guidance: Levenshtein distance versus other similarity algorithms |
Дата | |
Msg-id | CAHyXU0ycbACiVOLQ2Q-nKfr7c4oSs+4Hgbj5aD7g=J2K1yTwXQ@mail.gmail.com обсуждение исходный текст |
Ответ на | General guidance: Levenshtein distance versus other similarity algorithms (Rachel Owsley <Rachel.Owsley@edointeractive.com>) |
Ответы |
Re: General guidance: Levenshtein distance versus other
similarity algorithms
|
Список | pgsql-general |
On Mon, Jul 23, 2012 at 11:55 AM, Rachel Owsley <Rachel.Owsley@edointeractive.com> wrote: > Hi, > > I am hoping you can give me some guidance here. I’m using postgresql 9.1. > > Basically, I’m trying to create a query on a table of businesses that will > return all similar matches to a business name. This is a huge table, and > there is a lot of variation in names. The length of the string can be up to > 255. I’ve used regex, but there are always some variations of the name that > are missed when I do a regex. So I decided to look at distance measures. > > Has anyone compared the fuzzstrmatch package to pgsimilarity? > > Would the levenshtein function in postgresql be the best way to go here? If > so, should I use levenshtein in the contribution package or install the > pgsimilarity package? Has anyone tried both implementations? Another option that works with 9.1 is the pg_trgm module (http://www.postgresql.org/docs/9.1/static/pgtrgm.html). It works very well for 9.1 and has the advantage of having built-in gist and gin operator support. Can't speak on pg_similarity, haven't used it. merlin
В списке pgsql-general по дате отправления: