Re: String Similarity
От | Mark Woodward |
---|---|
Тема | Re: String Similarity |
Дата | |
Msg-id | 18626.24.91.171.78.1148130741.squirrel@mail.mohawksoft.com обсуждение исходный текст |
Ответ на | Re: String Similarity ("Mark Woodward" <pgsql@mohawksoft.com>) |
Список | pgsql-hackers |
> What I was hoping someone had was a function that could find the substring > runs in something less than a strlen1*strlen2 number of operations and a > numerically sane way of representing the similarity or difference. Acually, it is more like strlen1*strlen2*N, where N is the number of valid runs. Unless someone has a GREAT algorithm, I think it will always be at least strlen1*strlen2. The amount of processing for N is the question. Is N * (strlen1*strlen2) less than sorting an array of N elements, scanning through those elements and eliminating duplicate character matches? Depending on the max value of N, I could save all the runs, sort by max length, then exclude based on overlapp, but it isn't clear that this is a performance win unless the strings are long, even then, I'm not completely convinced as N still has some strlen ramifications for removing duplicates.
В списке pgsql-hackers по дате отправления: