Re: Building a database from a flat file
От | Sean Davis |
---|---|
Тема | Re: Building a database from a flat file |
Дата | |
Msg-id | 8014dca0db521733863bb7967ce4be95@mail.nih.gov обсуждение исходный текст |
Ответ на | Building a database from a flat file ("Casey T. Deccio" <ctdecci@sandia.gov>) |
Ответы |
Re: Building a database from a flat file
|
Список | pgsql-sql |
On Mar 2, 2005, at 7:15 PM, Casey T. Deccio wrote: > A database I am currently using is built and updated periodically from > a > flat csv file (The situation is rather unfortunate, but that's all I > have right now). The schema I use is more complex than the flat file, > so I follow a process to populate the tables with the data from the > file. First I slurp the whole file into one temporary table, whose > columns correspond to the columns in the file. Then I DELETE all the > existing rows from the tables in the schema and perform a series of > queries on that table to INSERT and UPDATE rows in the tables that are > in the schema. Then I DELETE the data from the temporary table. I do > it this way, rather than trying to synchronize it, because of the > inconsistencies and redundancies in the flat file. > > There is more than one problem with this, but the largest is that I > would like to perform this whole database rebuild within one > transaction, so other processes that need to access the database can do > so without noticing the disturbance. However, performing this set of > events (besides populating the temporary table) within a single > transaction takes a long time--over an hour in some cases. > > What are some suggestions to help improve performance with replacing > one > set of data in a schema with another? Why not rebuild the entire thing in a separate "build" schema then do only the stuff like copying tables inside the transaction block: BEGIN; truncate table1; truncate table2; --reset any sequences you feel you need to have reset; select * into table1 from build.table1; select * into table2 from build.table2; vacuum analyze table1; vacuum analyze table2; COMMIT; I haven't tried this method exactly, but building in a separate schema (expensive) and then doing cheap operations like copying the table into the working schema should minimize the amount of time you spend inside the transaction block. **I don't know what effect this will have on performance of the whole process, though**. Sean
В списке pgsql-sql по дате отправления: