Re: What's the fastest way to do this?
От | Colin 't Hart |
---|---|
Тема | Re: What's the fastest way to do this? |
Дата | |
Msg-id | 9sg4mn$1r5e$1@news.tht.net обсуждение исходный текст |
Ответ на | Re: What's the fastest way to do this? (Einar Karttunen <ekarttun@cs.Helsinki.FI>) |
Список | pgsql-general |
Einar Karttunen writes: > > I have several really big tables that have columns uniquely identified by > > single or multiple rows. [ I have about 25 tables, 10k to 500k rows > > per table ] > > > > Each day I get a flat file of updates. I have no way of knowing which > > lines in the file are new records and which are updates for existing > > records. > > > > I need a way to insert the new ones and update the old ones. I have > > a couple of ideas but none of them seem fast enough ( I will soon > > be getting updates faster than I can feed them into the database ). > > I was facing a similar problem some time ago. My solution was to create > a temp table and COPY the new data to it. After that I deleted all records > in the original table which existed in the temporary table. Then I just > did a insert from a select * from the temp table. Of course with this > approach you have to lock the tables. > > - Einar Karttunen Even faster is to: 1. load the data into a temporary table (even faster is to load into a permanent table -- just truncate it first in each run). Let's call this table ImportTable. 2. update the existing records UPDATE rt SET a = t.a, b = x.b, c = x.c FROM RealTable AS rt, ImportTable AS x WHERE rt.pk = x.pk 3. insert the new records INSERT INTO RealTable(pk, a, b, c) SELECT pk, a, b, c FROM ImportTable AS x WHERE NOT EXISTS (SELECT 1 FROM RealTable WHERE pk = x.pk) 'pk' is the primary key of the table, or, at worst, a UNIQUEly INDEXed column combination. This avoids the expensive DELETE operation (DBMSs are generally better at INSERTing and UPDATEing than DELETEing, and remember that the DELETEs have to go through the transaction log). Don't use cursors if you can help it -- cursors can be up to several orders of magnitude slower, and usually at least 4 times slower. Using an import table allows you to sanitize the data by insert a step to do this between steps 1 and 2. Cheers, Colin
В списке pgsql-general по дате отправления: