Re: [personal] Re: Filtering duplicated row with a trigger
От | Bruno Wolff III |
---|---|
Тема | Re: [personal] Re: Filtering duplicated row with a trigger |
Дата | |
Msg-id | 20031006180141.GA28877@wolff.to обсуждение исходный текст |
Ответ на | Filtering duplicated row with a trigger (papapep <papapep@gmx.net>) |
Список | pgsql-novice |
Please keep messages copied to the list. On Mon, Oct 06, 2003 at 19:38:46 +0200, papapep <papapep@gmx.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Bruno Wolff III wrote: > | And what do want to happen when you run accross a duplicate row? > | Do you just want to discard tuples with a duplicate primary key? > > Initially I wanted to do so. But after I thought to "register" them in a > dups table. OK. > > | If you are discarding duplicates, do you care which of the duplicates > | is discarded? > > That's why I said that perhaps it should be a good thing to keep them in > a duplicated-rows table, for reviewing them. > > | If you want to combine data from the duplicates, do you have a precise > | description of what you want to happen? > > No, I do not need to combine data from the duplicated. The entire row is > accepted or not. If you initially don't care which dup gets inserted, then the select DISTINCT ON suggestion will work for loading the real table. Something like: insert into real_table select distinct on (pk1, pk2, pk3, pk4, pk5) * from temp_table; To see the sets of duplicates you can do something like: select * from temp_table, (select pk1, pk2, pk3, pk4, pk5 from temp_table group by pk1, pk2, pk3, pk4, pk5 having count(*) > 1) as dups where temp_table.pk1 = dups.pk1 and temp_table.pk2 = dups.pk2 and temp_table.pk3 = dups.pk3 and temp_table.pk4 = dups.pk4 and temp_table.pk5 = dups.pk5; If there are a large number of records being imported, you probably want to create an multicolumn index on pk1, pk2, pk3, pk4, and pk5 on the temp table after loading it and before doing the query for duplicates.
В списке pgsql-novice по дате отправления: