Large import via temporary table: all at once, or split into chunks?
От | Doug Fields |
---|---|
Тема | Large import via temporary table: all at once, or split into chunks? |
Дата | |
Msg-id | 5.1.0.14.2.20021210112440.032f3ad8@pop.pexicom.com обсуждение исходный текст |
Список | pgsql-general |
Hello all, (We're using JDBC to do this, and server 7.2.1-2woody2 on Debian 3.0) We regularly handle large imports of data. (Questions at the end.) The data goes into a table with these salient features: * About 100 million rows at this time * About 25 columns: mostly VARCHAR, two TIMESTAMP (one DEFAULT now()), some INTEGER, some boolean * One index on a SERIAL column (the primary key) * One index on two int columns (a secondary key and the primary key, combined index) * One index on a VARCHAR column * WITHOUT OIDs The way we currently import the new records is: 1) Create a temporary table - WITHOUT OIDS and without any indices 2) INSERT INTO temptable VALUES (...); for each piece of data, and usually 500 of these in one JDBC statement, for efficiency 3) INSERT INTO finaltable (secondaryKey, columns...) SELECT ###, * FROM temptable; - where ### is the salient secondary key from above, and the serial column is populated by the DEFAULT 4) Drop the temporary table With 7.2 JDBC, we can't use the COPY functionality to speed it up. My questions are these: When doing the above process for, say, a million to ten million rows (our typical size), how expensive is step #3? It takes four hours to complete on a 1 million row table, and I have no idea how badly it is harming concurrency, since it's moving so much data in one transaction. Would I be better off importing some lesser number of rows to the temporary table in step 2 (how many?) and then doing step 3, then truncating the temporary table and starting over with step 2 again? How do these choices affect the sizing of the transaction log parameters? Thanks, Doug
В списке pgsql-general по дате отправления: