Re: What's the fastest way to do this?

Поиск

Список

Период

Сортировка

От	Colin 't Hart
Тема	Re: What's the fastest way to do this?
Дата	9 ноября 2001 г. 04:18:28
Msg-id	9sg4mn$1r5e$1@news.tht.net обсуждение исходный текст
Ответ на	Re: What's the fastest way to do this? (Einar Karttunen <ekarttun@cs.Helsinki.FI>)
Список	pgsql-general

Дерево обсуждения

Einar Karttunen writes:

> > I have several really big tables that have columns uniquely identified
by
> > single or multiple rows.  [ I have about 25 tables, 10k to 500k rows
> > per table ]
> >
> > Each day I get a flat file of updates.  I have no way of knowing which
> > lines in the file are new records and which are updates for existing
> > records.
> >
> > I need a way to insert the new ones and update the old ones.   I have
> > a couple of ideas but none of them seem fast enough ( I will soon
> > be getting updates faster than I can feed them into the database ).
>
> I was facing a similar problem some time ago. My solution was to create
> a temp table and COPY the new data to it. After that I deleted all records
> in the original table which existed in the temporary table. Then I just
> did a insert from a select * from the temp table. Of course with this
> approach you have to lock the tables.
>
> - Einar Karttunen

Even faster is to:

1. load the data into a temporary table (even faster is to load into a
permanent
table -- just truncate it first in each run). Let's call this table
ImportTable.

2. update the existing records
   UPDATE rt
     SET a = t.a, b = x.b, c = x.c
     FROM RealTable AS rt, ImportTable AS x
     WHERE rt.pk = x.pk

3. insert the new records
   INSERT INTO RealTable(pk, a, b, c)
     SELECT pk, a, b, c
     FROM ImportTable AS x
     WHERE NOT EXISTS (SELECT 1 FROM RealTable WHERE pk = x.pk)

'pk' is the primary key of the table, or, at worst, a UNIQUEly INDEXed
column
combination.

This avoids the expensive DELETE operation (DBMSs are generally better at
INSERTing and UPDATEing than DELETEing, and remember that the DELETEs have
to
go through the transaction log).

Don't use cursors if you can help it -- cursors can be up to several orders
of
magnitude slower, and usually at least 4 times slower.

Using an import table allows you to sanitize the data by insert a step to do
this between steps 1 and 2.

Cheers,

Colin

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: What's the fastest way to do this?