Re: Slow duplicate deletes
От | Merlin Moncure |
---|---|
Тема | Re: Slow duplicate deletes |
Дата | |
Msg-id | CAHyXU0ztqbzy3=cQiJDW+96mHzUz7WWKhqk00okvtp2jvQYC9A@mail.gmail.com обсуждение исходный текст |
Ответ на | Slow duplicate deletes (DrYSG <ygutfreund@draper.com>) |
Ответы |
Re: Slow duplicate deletes
|
Список | pgsql-novice |
On Mon, Mar 5, 2012 at 9:17 AM, DrYSG <ygutfreund@draper.com> wrote: > I have a large table (20M records) but mostly short text fields. There are > duplicates that I am trying to remove. I have a bigseriel index that I > unique, but duplicates in another field. > > I have an 8 core, 12GB memory computer with RAID disks. > > This request has been running for 70 hours (is it safe to kill it?). yes...generally speaking, it's safe to kill just about any query in postgres any time. > How can I make this run faster? This is a one time processing task, but it > is taking a long time. > > DELETE FROM portal.metadata > WHERE idx NOT IN > ( > SELECT MIN(idx) > FROM portal.metadata > GROUP BY "data_object.unique_id" > ); compare the plan for that query (EXPLAIN) vs this one: /* delete the records from m1 if there is another record with a lower idx for the same unique_id */ DELETE FROM portal.metadata m1 WHERE EXISTS ( SELECT 1 FROM portal.metadata m2 WHERE m1.unique_id = m2.unique_id AND m2.idx < m1.idx ) also, if you don't already have one, consider making an index on at least unqiue_id, or possibly unique_id, idx. back up your database before running this query :-). merlin
В списке pgsql-novice по дате отправления: