RE: Fuzzy matching?
От | Robby Slaughter |
---|---|
Тема | RE: Fuzzy matching? |
Дата | |
Msg-id | EPEHLKLEHAHLONFOKNHNGEGCDDAA.webmaster@robbyslaughter.com обсуждение исходный текст |
Ответ на | Fuzzy matching? ("Josh Berkus" <josh@agliodbs.com>) |
Список | pgsql-sql |
Here's an off the cuff reply: It sounds like fuzzy_match(str1,str2,num) is really just a tokenizer-type operation. The number is exactly one less than the potential number of string segments that you are interested in. For example: fuzzy_match('Thornton','Tornton',1) = TRUE Because the two segements are 'T' and 'ornton' And also: fuzzy_match('Thornton','Torntin',2) = TRUE Becuse the three segments are 'T', "ornt', and 'n' So, it seems like you could try to build the tokens, which would be probably more efficient than just trying all permutations. HTH -Robby -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Josh Berkus Sent: Tuesday, July 31, 2001 11:05 AM To: pgsql-sql@postgresql.org Subject: [SQL] Fuzzy matching? Folks, For many of my programs, it would be extremely useful to have some form of "fuzzy matching" for VARCHAR fields. There are two kinds of fuzzy matching for words that I know of: 1. Phonetic matching, which would be nice but will have to wait for someone's $100,000 project; 2. Textual mathcing, which I will outline below. The way textual fuzzy matching should work is as follows: The developer supplies two VARCHARs to match and a number/percent of character mis-match that is acceptable: Fuzzy_match('Thornton','Tornton',1) And the fuzzy_match should return True if the two phrases are no more than that number of characters different. Thus, we should get: fuzzy_match('Thornton','Tornton',1) = TRUE fuzzy_match('Thornton','Torntin',1) = FALSE fuzzy_match('Thornton','Torntin',2) = TRUE Unfortunately, I cannot think of a way to make this happen in a function without cycling through all the possible permutations of characters for both words or doing some character-by-character comparison with elaborate logic for placement. Either of these approaches would be very slow, and completely unsuitable for column comparisons on large tables. Can anyone suggest some shortcuts here? Perhaps using pl/perl or something similar? Grazie! -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
В списке pgsql-sql по дате отправления: