Re: [GENERAL] Duplicate Row Removal
От | Florian G. Pflug |
---|---|
Тема | Re: [GENERAL] Duplicate Row Removal |
Дата | |
Msg-id | 436D19F5.1080506@phlo.org обсуждение исходный текст |
Список | pgadmin-hackers |
Berend Tober wrote: > Dean Gibson (DB Administrator) wrote: > >> CREATE TABLE new_name AS SELECT DISTINCT * FROM old_name; >> >> DROP TABLE old_name; >> >> ALTER TABLE new_name RENAME TO old_name; > > The problem with this technique is that it doesn't account for indexes, > foreign key references, and other dependencies. > > Another approach is to temporarily add an integer column, populate it > with sequential values, and then use that new column to uniquely > identify the rows that are otherwise duplicates. Then you can use > aggregation to identify and delete the rows you don't need, followed by > dropping the temporary extra column. HTH. Or, you use the column ctid, which exists in every table (I think it tells you where to find the tuple in the datafile) and is guaranteed to be unique (because two tuples cant reference the same place in the datafile). So, try delete from table where exists (select 1 from table t1 where t1.f1 = table.f1 and t1.f2 = table.f2 and ... ... t1.fn = table.fn and textin(tidout(t1.ctid)) < textin(tidout(table.ctid)) ) ; The textin(varcharout(x)) trick is necessary, because you need to choose _one_ row of each group of duplicates that shall survive. Since the type of ctid (which is tid) doesn't support smaller-than comparisons, we convert it to some varchar, and compare that. It's not important how the converted varchar looks, and which ctid turns out to be the smallest - all that matters it that there _will_ _be_ exactly one smallest ctid for each group of duplicated, and that tuple will survive, because the select would find any row in that case. greetings, Florian Pflug PS: I believe this should be added to some FAQ - It's a neat trick, but difficult to figure out, and the question seems to come up quite often.
В списке pgadmin-hackers по дате отправления: