Re: SQL for removing duplicates?
От | Brad Nicholson |
---|---|
Тема | Re: SQL for removing duplicates? |
Дата | |
Msg-id | 448F033C.1020708@ca.afilias.info обсуждение исходный текст |
Ответ на | SQL for removing duplicates? (<kynn@panix.com>) |
Список | pgsql-novice |
kynn@panix.com wrote: > Hi. I'm stumped. I have a large table (about 8.5M records), let's > call it t, whose columns include x and y. I want to remove records > from this table so that any pair of values for these two fields appear > only once. (This will get rid of about 15% of the records in t.) > > One simple solution would be something like > > CREATE TABLE tmp AS SELECT DISTINCT ON ( x, y ) * FROM t; > DROP TABLE t; > ALTER TABLE tmp RENAME TO t; > > This works, but it uses a lot of space. I would prefer to simply cull > the unwanted records from t, but I just can't figure out the SQL for > it. Any help with it would be *much* appreciated. If your table is created with OIDs, this should work. If not add a unique column to the table and use that in place of oid. DELETE FROM t where oid IN (select t2.oid from t t2 EXCEPT SELECT max(t3.oid) from t t3 group by t3.x, t3.y); Also note, the query plan for this is going to be very ugly, it might very well be cheaper to use the solution that you initially mentioned. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
В списке pgsql-novice по дате отправления: