Re: String Similarity
От | Mark Woodward |
---|---|
Тема | Re: String Similarity |
Дата | |
Msg-id | 18825.24.91.171.78.1148124568.squirrel@mail.mohawksoft.com обсуждение исходный текст |
Ответ на | Re: String Similarity (Oleg Bartunov <oleg@sai.msu.su>) |
Ответы |
Re: String Similarity
|
Список | pgsql-hackers |
> Get pg_trgm http://www.sai.msu.su/~megera/oddmuse/index.cgi/ReadmeTrgm > It doesn't depends on language. That's an interesting approach. This is what I got: apps$ ./stratest "pink floyd dark side of the moon money" "dark side of the moon pink floyd" Match: dark side of the moon Match: pink floyd Similarity: 89 One function finds the substring runs, in descending order of length, between the two strings. After the function, I have number of runs, length of best run, total number of characters matched. Without going into too lengthy description, while space and punctuation are not reliable. Like this "pinkfloyd" or "pink floyd" "darkside" or "dark side" Humans are VERY good at seeing these things, computers, pardon, suck. What I was hoping someone had was a function that could find the substring runs in something less than a strlen1*strlen2 number of operations and a numerically sane way of representing the similarity or difference.
В списке pgsql-hackers по дате отправления: