Re: String Similarity
От | Greg Sabino Mullane |
---|---|
Тема | Re: String Similarity |
Дата | |
Msg-id | 6592ec8ffe8907400bc98e9efa60c62c@biglumber.com обсуждение исходный текст |
Ответ на | String Similarity ("Mark Woodward" <pgsql@mohawksoft.com>) |
Ответы |
Re: String Similarity
Re: String Similarity |
Список | pgsql-hackers |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I have a side project that needs to "intelligently" know if two strings > are contextually similar. The examples you gave seem heavy on word order and whitespace consideration, before applying any algorithms. Here's a quick perl version that does the job: CREATE OR REPLACE FUNCTION matchval(text,text) RETURNS INT LANGUAGE plperlu AS $$ use strict; use String::Approx 'adist'; my $uno = join ' ', sort split /\s+/ => lc shift; my $dos = join ' ', sort split /\s+/ => lc shift; return adist(length $uno<length $dos ? ($uno,$dos) : ($dos,$uno)); $$; Some sample runs: SELECT matchval('pink floyd - dark side of the moon - money', 'dark side of the moon - pink floyd - money'); SELECT matchval('dark floyd of money moon pink side the', 'Money - dark side of the moon - Pink Floyd'); SELECT matchval('dark floyd of money moon pink side the', 'monee - drk sidez of da moon - pink floyd'); SELECT matchval('dark floyd of money moon pink side the', 'pink floyd - animals'); SELECT matchval('dark floyd of money moon pink side the', 'walking on the moon - the police'); The above returns 0, 0, 6, 10, and 17; a score of 0 is an exact match. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200605191835 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFEbktUvJuQZxSWSsgRAiCtAJ9nlpqGxlYnimDPp8t5XQsc8y9RywCfZZL6 iU9iPnxHaWOvYCUD7+rK8Do= =zo3T -----END PGP SIGNATURE-----
В списке pgsql-hackers по дате отправления: