Re: How to insert a bulk of data with unique-violations very fast
От | Andy Colson |
---|---|
Тема | Re: How to insert a bulk of data with unique-violations very fast |
Дата | |
Msg-id | 4C0B986E.2060403@squeakycode.net обсуждение исходный текст |
Ответ на | How to insert a bulk of data with unique-violations very fast (Torsten Zühlsdorff <foo@meisterderspiele.de>) |
Список | pgsql-performance |
On 06/01/2010 10:03 AM, Torsten Zühlsdorff wrote: > Hello, > > i have a set of unique data which about 150.000.000 rows. Regullary i > get a list of data, which contains multiple times of rows than the > already stored one. Often around 2.000.000.000 rows. Within this rows > are many duplicates and often the set of already stored data. > I want to store just every entry, which is not within the already stored > one. Also i do not want to store duplicates. Example: > > Already stored set: > a,b,c > > Given set: > a,b,a,c,d,a,c,d,b > > Expected set after import: > a,b,c,d > > I now looking for a faster way for the import. At the moment i import > the new data with copy into an table 'import'. then i remove the > duplicates and insert every row which is not already known. after that > import is truncated. > > Is there a faster way? Should i just insert every row and ignore it, if > the unique constrain fails? > > Here the simplified table-schema. in real life it's with partitions: > test=# \d urls > Tabelle »public.urls« > Spalte | Typ | Attribute > --------+---------+------------------------------------------------------- > url_id | integer | not null default nextval('urls_url_id_seq'::regclass) > url | text | not null > Indexe: > »urls_url« UNIQUE, btree (url) > »urls_url_id« btree (url_id) > > Thanks for every hint or advice! :) > > Greetings from Germany, > Torsten I do this with a stored procedure. I do not care about speed because my db is really small and I only insert a few recordsa month. So I dont know how fast this is, but here is my func: CREATE FUNCTION addentry(idate timestamp without time zone, ilevel integer) RETURNS character varying AS $$ declare tmp integer; begin insert into blood(adate, alevel) values(idate, ilevel); return 'ok'; exception when unique_violation then select into tmp alevel from blood where adate = idate; if tmp <> ilevel then return idate || ' levels differ!'; else return 'ok, already in table'; end if; end; $$ LANGUAGE plpgsql; Use it like, select * from addentry('2010-006-06 8:00:00', 130); I do an extra check that if the date's match that the level's match too, but you wouldnt have to. There is a unique indexon adate. -Andy
В списке pgsql-performance по дате отправления: