Re: How to find double entries
От | Volkan YAZICI |
---|---|
Тема | Re: How to find double entries |
Дата | |
Msg-id | 87zlru72b5.fsf@alamut.mobiliz.com.tr обсуждение исходный текст |
Ответ на | How to find double entries (Andreas <maps.on@gmx.net>) |
Список | pgsql-sql |
On Wed, 16 Apr 2008, Andreas <maps.on@gmx.net> writes: > how can I find double entries in varchar columns where the content is > not 100% identical because of a spelling error or the person > considered it "looked nicer" that way? > > I'd like to identify and then merge records of e.g. 'google', > gogle', 'guugle' > > Then I want to match abbrevations like 'A-Company Ltd.', 'a company > ltd.', 'A-Company Limited' > > Is there a way to do this? > It would be OK just to list candidats up to be manually checked > afterwards. You can try something similar to below example. (levenshtein(text, text) function is supplied by fuzzystrmatch module.) SELECT T1.col, T2.col FROM tbl AS T1, INNER JOIN tbl AS T2 ON T1.col <> T2.col AND levenshtein(T1.col,T2.col) < (length(T1.col) * 0.5) Regards.
В списке pgsql-sql по дате отправления: