RE: Fuzzy matching?

Поиск

Список

Период

Сортировка

От	Robby Slaughter
Тема	RE: Fuzzy matching?
Дата	31 июля 2001 г. 13:48:22
Msg-id	EPEHLKLEHAHLONFOKNHNGEGCDDAA.webmaster@robbyslaughter.com обсуждение исходный текст
Ответ на	Fuzzy matching? ("Josh Berkus" <josh@agliodbs.com>)
Список	pgsql-sql

Дерево обсуждения

Here's an off the cuff reply:

It sounds like fuzzy_match(str1,str2,num) is
really just a tokenizer-type operation. The number is exactly
one less than the potential number of string segments
that you are interested in. For example:
  fuzzy_match('Thornton','Tornton',1) = TRUE
 Because the two segements are 'T' and 'ornton'

And also:
  fuzzy_match('Thornton','Torntin',2) = TRUE
 Becuse the three segments are 'T', "ornt', and 'n'

So, it seems like you could try to build the tokens,
which would be probably more efficient than just trying
all permutations.

HTH
-Robby


-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Josh Berkus
Sent: Tuesday, July 31, 2001 11:05 AM
To: pgsql-sql@postgresql.org
Subject: [SQL] Fuzzy matching?


Folks,

For many of my programs, it would be extremely useful to have some form
of "fuzzy matching" for VARCHAR fields.  There are two kinds of fuzzy
matching for words that I know of:

1. Phonetic matching, which would be nice but will have to wait for
someone's $100,000 project;

2. Textual mathcing, which I will outline below.

The way textual fuzzy matching should work is as follows:
The developer supplies two VARCHARs to match and a number/percent of
character mis-match that is acceptable:

Fuzzy_match('Thornton','Tornton',1)

And the fuzzy_match should return True if the two phrases are no more
than that number of characters different.  Thus, we should get:

fuzzy_match('Thornton','Tornton',1) = TRUE
fuzzy_match('Thornton','Torntin',1) = FALSE
fuzzy_match('Thornton','Torntin',2) = TRUE

Unfortunately, I cannot think of a way to make this happen in a function
without cycling through all the possible permutations of characters for
both words or doing some character-by-character comparison with
elaborate logic for placement.  Either of these approaches would be very
slow, and completely unsuitable for column comparisons on large tables.

Can anyone suggest some shortcuts here?  Perhaps using pl/perl or
something similar?

Grazie!

-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco

В списке pgsql-sql по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

RE: Fuzzy matching?