Re: tuning Postgres for large data import (using Copy from)
От | John A Meinel |
---|---|
Тема | Re: tuning Postgres for large data import (using Copy from) |
Дата | |
Msg-id | 42836DEB.30600@arbash-meinel.com обсуждение исходный текст |
Ответ на | tuning Postgres for large data import (using Copy from) ("Marc Mamin" <m.mamin@gmx.net>) |
Список | pgsql-performance |
Marc Mamin wrote: > Hello, > I'm not an expert, but I'll give some suggestions. > > I'd like to tune Postgres for large data import (using Copy from). > I believe that COPY FROM <file> is supposed to be faster than COPY FROM STDIN, but <file> must be available to the backend process. If you can do it, you should think about it, as it eliminates the communication between the client and the backend. > > here are a few steps already done: > > > > 1) use 3 different disks for: > > -1: source data > -2: index tablespaces > -3: data tablespaces > Make sure pg_xlog is on it's own filesystem. It contains the write-ahead-log, and putting it by itself keeps the number of seeks down. If you are constrained, I think pg_xlog is more important than moving the index tablespaces. > > 2) define all foreign keys as initially deferred > > > 3) tune some parameters: > > > > max_connections =20 > shared_buffers =30000 > work_mem = 8192 > maintenance_work_mem = 32768 > checkpoint_segments = 12 > > (I also modified the kernel accordingly) > Don't forget to increase your free space map if you are going to be doing deletes frequently. > > > > 4) runs VACUUM regulary > > > The server runs RedHat and has 1GB RAM > > In the production (which may run on a better server), I plan to: > > - import a few millions rows per day, > - keep up to ca 100 millions rows in the db > - delete older data > > > > > I've seen a few posting on hash/btree indexes, which say that hash index do > not work very well in Postgres; > currently, I only use btree indexes. Could I gain performances whole using > hash indexes as well ? > I doubt it. > How does Postgres handle concurrent copy from on: same table / different > tables ? > I think it is better with different tables. If using the same table, and there are indexes, it has to grab a lock for updating the index, which causes contention between 2 processes writing to the same table. > > I'd be glad on any further suggestion on how to further increase my > performances. > Since you are deleting data often, and copying often, I might recommend using a partition scheme with a view to bind everything together. That way you can just drop the old table rather than doing a delete. I don't know how this would affect foreign key references. But basically you can create a new table, and do a copy without having any indexes, then build the indexes, analyze, update the view. And when deleting you can update the view, and drop the old table. Something like this: CREATE TABLE table_2005_05_11 AS (blah); COPY FROM ... ; CREATE INDEX blah ON table_2005_05_11(blah); CREATE OR REPLACE VIEW table AS SELECT * FROM table_2005_05_10 UNION ALL SELECT * FROM table_2005_05_11; VACUUM ANALYZE table_2005_05_11; ... John =:-> > > > > Marc > > > >
Вложения
В списке pgsql-performance по дате отправления: