Обсуждение: Recommended technique for large imports?

Поиск
Список
Период
Сортировка

Recommended technique for large imports?

От
Stephen Bacon
Дата:
Hello,

I'm running a tomcat-based web app (Tomcat 3.3 under Linux 7.3) with
PostgreSQL (7.2.1) as the back end. I need to add new import
functionality. From previous importer experience with this site, I'm
worried that it can take so long that the user's browser times out
waiting for the process to complete (only ever happens when they're
importing a lot of records when the system is under heavy demand - the
main set of tables have a lot of indexes, so the loop / insert method
can take a bit).

Of course the data gets in there, but the user can end up with a
404-type of error anyways and no one likes to see that.

Now I know the COPY command is much faster because it doesn't update the
indexes after every row insert, but building that and passing it via
jdbc seems iffy (or C, PHP, etc. for that matter).

Can anyone give a recommended technique for this sort of process?

Basically (I think) I need to do something like:

 Start transaction
 Turn off indexing for this transaction
 loop 1..n
   insert record X
 end loop
 Turn indexing back on
 Commit / End transaction

thanks,
  -Steve

(appologies for the cross-post, but I figured it's not specifically jdbc
related)



Re: Recommended technique for large imports?

От
Sam Varshavchik
Дата:
Stephen Bacon writes:

> Now I know the COPY command is much faster because it doesn't update the
> indexes after every row insert, but building that and passing it via
> jdbc seems iffy (or C, PHP, etc. for that matter).

I think someone was working on a COPY implementation for jdbc, but I don't
think it's there yet.

> Can anyone give a recommended technique for this sort of process?

Feed a few thousand INSERTs to addBatch(), then call executeBatch().  That
seems to be the fastest way to import data, at this time.