De-duplicating rows

Поиск

Список

Период

Сортировка

От	Christophe
Тема	De-duplicating rows
Дата	17 июля 2009 г. 00:14:54
Msg-id	87E8E647-4CAD-4491-B687-FA763E0EBE7E@thebuild.com обсуждение исходный текст
Ответы	Re: De-duplicating rows Re: De-duplicating rows
Список	pgsql-sql

Дерево обсуждения

The Subject: is somewhat imprecise, but here's what I'm trying to do.   
For some reason, my brain is locking up over it.

I'm moving a 7.2 (yes) database to 8.4.  In the table in question, the  
structure is along the lines of:
serial_number    SERIAL, PRIMARY KEYemail        TEXTcreate_date    TIMESTAMPattr1        typeattr2        typeattr3
   type...
 

(The point of the "attr" fields is that there are many more columns  
for each row.)

The new structure removes the "serial_number" field, and uses "email"  
as the primary key, but is otherwise unchanged:
email        TEXT, PRIMARY KEYcreate_date    TIMESTAMPattr1        typeattr2        typeattr3        type...

Now, since this database has been production since 7.2 days, cruft has  
crept in: in particular, there are duplicate email addresses, some  
with mismatched attributes.  The policy decision by the client is that  
the correct row is the one with the earliest timestamp.  (The  
timestamps are widely distributed; it's not the case that there is a  
single timestamp above which all the duplicates live.)  Thus, ideally,  
I want to select exactly one row per "email", picking the row with the  
earliest timestamp in the case that there is more than one row with  
that email.

Any suggestions on how to write such a SELECT?  Of course, I could do  
this with an application against the db, but a single SELECT would be  
great if possible.

TIA!

В списке pgsql-sql по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

De-duplicating rows