Re: Need magic for identifieing double adresses
От | Andreas |
---|---|
Тема | Re: Need magic for identifieing double adresses |
Дата | |
Msg-id | 4C921A07.7060703@gmx.net обсуждение исходный текст |
Ответ на | Re: Need magic for identifieing double adresses (Sam Mason <sam@samason.me.uk>) |
Ответы |
Re: Need magic for identifieing double adresses
Re: Need magic for identifieing double adresses |
Список | pgsql-general |
Am 16.09.2010 13:18, schrieb Sam Mason: > On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote: >> I need to clean up a lot of contact data because of a merge of customer >> lists that used to be kept separate. > What to do depends on how much data you have; a few thousand and you can > do lots of fiddling by hand, whereas if you have a few tens of millions > of people you want to try and do more with code. > Thanks Sam, I'll check this fuzzystrmatch. We are talking about nearly 500.000 records with considerable overlapping. It's not only typos to catch. There is variation in the way to write things that not necessarily are wrong. e.g. Miller's Bakery Bakery Miller Bakery Miller, Ltd. Bakery Miller and sons Bakery Smith (formerly Miller) and the usual Strawberry Street Strawberrystreet Strawberry Str.42 Strawberry Str. 42 Strawberry Str. 42-45 Regards Andreas
В списке pgsql-general по дате отправления: