Re: De-duplicating rows
От | Frank Bax |
---|---|
Тема | Re: De-duplicating rows |
Дата | |
Msg-id | 4A609310.4050003@sympatico.ca обсуждение исходный текст |
Ответ на | Re: De-duplicating rows (Richard Huxton <dev@archonet.com>) |
Список | pgsql-sql |
Richard Huxton wrote: > Christophe wrote: >> Now, since this database has been production since 7.2 days, cruft has >> crept in: in particular, there are duplicate email addresses, some >> with mismatched attributes. The policy decision by the client is that >> the correct row is the one with the earliest timestamp. > > Something like (untested): > > CREATE TEMPORARY TABLE earliest_duplicates AS > SELECT > email AS tgt_email, > min(create_date) AS tgt_date > FROM mytable > GROUP BY email > HAVING count(*) > 1; > > DELETE FROM mytable USING earliest duplicates > WHERE email=tgt_email AND create_date > tgt_date; If it is possible that two rows exist for the same email/date; then you will likely need to deal with these manually. If you rerun the above SELECT after running the delete you should identify these rows.
В списке pgsql-sql по дате отправления: