De-duplicating rows
От | Christophe |
---|---|
Тема | De-duplicating rows |
Дата | |
Msg-id | 87E8E647-4CAD-4491-B687-FA763E0EBE7E@thebuild.com обсуждение исходный текст |
Ответы |
Re: De-duplicating rows
Re: De-duplicating rows |
Список | pgsql-sql |
The Subject: is somewhat imprecise, but here's what I'm trying to do. For some reason, my brain is locking up over it. I'm moving a 7.2 (yes) database to 8.4. In the table in question, the structure is along the lines of: serial_number SERIAL, PRIMARY KEYemail TEXTcreate_date TIMESTAMPattr1 typeattr2 typeattr3 type... (The point of the "attr" fields is that there are many more columns for each row.) The new structure removes the "serial_number" field, and uses "email" as the primary key, but is otherwise unchanged: email TEXT, PRIMARY KEYcreate_date TIMESTAMPattr1 typeattr2 typeattr3 type... Now, since this database has been production since 7.2 days, cruft has crept in: in particular, there are duplicate email addresses, some with mismatched attributes. The policy decision by the client is that the correct row is the one with the earliest timestamp. (The timestamps are widely distributed; it's not the case that there is a single timestamp above which all the duplicates live.) Thus, ideally, I want to select exactly one row per "email", picking the row with the earliest timestamp in the case that there is more than one row with that email. Any suggestions on how to write such a SELECT? Of course, I could do this with an application against the db, but a single SELECT would be great if possible. TIA!
В списке pgsql-sql по дате отправления: