Re: tuning Postgres for large data import (using Copy from)
От | Josh Berkus |
---|---|
Тема | Re: tuning Postgres for large data import (using Copy from) |
Дата | |
Msg-id | 200505121025.41556.josh@agliodbs.com обсуждение исходный текст |
Ответ на | tuning Postgres for large data import (using Copy from) ("Marc Mamin" <m.mamin@gmx.net>) |
Список | pgsql-performance |
Marc, > 1) use 3 different disks for: > > -1: source data > -2: index tablespaces > -3: data tablespaces Others have already told you about the importance of relocating WAL. If you are going to be building indexes on the imported data, you might find it beneficial to relocate pgsql_tmp for the database in question as well. Also, I generally find it more beneficial to seperate the few largest tables to their own disk resources than to put all tables on one resource and all disks on another. For example, for TPCH-like tests, I do array0: OS and pgsql_tmp array1: LINEITEM array2: LINEITEM Indexes array3: all other tables and indexes array4: pg_xlog array5: source data This allows me to load a 100G (actually 270G) TPCH-like database in < 2 hours, not counting index-building. > 2) define all foreign keys as initially deferred It would be better to drop them before import and recreate them afterwards. Same for indexes unless those indexes are over 2G in size. > max_connections =20 > shared_buffers =30000 > work_mem = 8192 Not high enough, unless you have very little RAM. On an 8G machine I'm using 256MB. You might want to use 64MB or 128MB. > maintenance_work_mem = 32768 REALLY not high enough. You're going to need to build big indexes and possibly vacuum large tables. I use the maximum of 1.98GB. Use up to 1/3 of your RAM for this. > checkpoint_segments = 12 Also way too low. Put pg_xlog on its own disk, give in 128 to 512 segments (up to 8G). > The server runs RedHat and has 1GB RAM Make sure you're running a 2.6.10+ kernel. Make sure ext3 is set noatime, data=writeback. Buy more RAM. Etc. > How does Postgres handle concurrent copy from on: same table / different > tables ? Same table is useless; the imports will effectively serialize (unless you use pseudo-partitioning). You can parallel load on multiple tables up to the lower of your number of disk channels or number of processors. -- Josh Berkus Aglio Database Solutions San Francisco
В списке pgsql-performance по дате отправления: