Обсуждение: loading a lot of data on windows is very slow
Hi Everybody,
Sorry for the vague post but I am looking for ideas in diagnosing a problem. I am wondering if I am missing something obvious for Postgres or maybe I failed to check something.
Does anyone have any suggestions?
Sincerely,
Kasia
OS: windows 2008, 32bit dual processor, 4 gigs of RAM
PG: 8.3.8
Problem: loading 16 gigs of data takes several days, postgres runs out of memory before it completes the load.
Client: In house application, I checked to see if it is not doing anything silly, it seems to be doing everything as expected, indexes are created after a table has loaded, we commit every 1000 inserts…
The client works with various databases if I load the same data on SQL Server it takes half a day.
Looking at Postgres memory consumption, I see that it is very conservative, there are about 8 postgres.exe processes running, the most memory a process takes is about: 17,000K, this is after 2.5 days of loading, so no memory leaks
I have tried loading it 3 times on Postgres, each time changing something,
I have tried:
- Altering the init parameters in postgresql.conf (several times)
- Giving the machine more RAM ( from 2 gigs to 4gigs)
Each time I try to load, it fails with an “out of memory” error, at a different point in the load:
2010-08-18 20:29:42 PDT ERROR: out of memory
2010-08-18 20:29:42 PDT DETAIL: Failed on request of size 134217728.
I am on my 4th attempt:
When I look at the postgres log as it is loading data ( I am tracing with default settings, for fear of running out of disk space) I see that things seem to progressing very slowly, looking at the distinct timestamps in the log file created today I see:
2010-08-25 02:04:22 PDT STATEMENT:
2010-08-25 07:45:56 PDT STATEMENT:
2010-08-25 07:50:57 PDT STATEMENT:
2010-08-25 08:36:39 PDT STATEMENT:
In a database world, where we worry about milliseconds, this is glacially slow.
Contents of the log when it starts running out memory:
TopMemoryContext: 338776 total in 41 blocks; 63632 free (423 chunks); 275144 used
TopTransactionContext: 8192 total in 1 blocks; 7024 free (0 chunks); 1168 used
CurTransactionContext: 8192 total in 1 blocks; 8176 free (2 chunks); 16 used
CurTransactionContext: 8192 total in 1 blocks; 7136 free (1 chunks); 1056 used
…
Prepared Queries: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used
Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used
Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used
Record information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used
CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used
MbProcContext: 1024 total in 1 blocks; 928 free (7 chunks); 96 used
MessageContext: 8192 total in 1 blocks; 4880 free (0 chunks); 3312 used
smgr relation table: 24576 total in 2 blocks; 10920 free (4 chunks); 13656 used
TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (1 chunks); 152 used
PortalHeapMemory: 7168 total in 3 blocks; 1288 free (5 chunks); 5880 used
ExecutorState: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
TupleSort: 24600 total in 2 blocks; 7840 free (0 chunks); 16760 used
TupleSort: 176152600 total in 23 blocks; 8379728 free (16 chunks); 167772872 used
Relcache by OID: 24576 total in 2 blocks; 15584 free (3 chunks); 8992 used
CacheMemoryContext: 2240336 total in 22 blocks; 997144 free (459 chunks); 1243192 used
…
MdSmgr: 8192 total in 1 blocks; 1480 free (0 chunks); 6712 used
LOCALLOCK hash: 24576 total in 2 blocks; 12056 free (4 chunks); 12520 used
Rendezvous variable hash: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used
Timezones: 49432 total in 2 blocks; 5968 free (0 chunks); 43464 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
2010-08-18 20:29:42 PDT ERROR: out of memory
Altered init parameters in postgresql.conf, I arrived at this through trial and error:
shared_buffers = 256MB # min 128kB or max_connections*16kB
work_mem = 128MB # min 64kB
maintenance_work_mem = 256MB # min 1MB
max_fsm_pages = 600000 # min max_fsm_relations*16, 6 bytes each
checkpoint_segments = 256 # in logfile segments, min 1, 16MB each
effective_cache_size = 1GB
Kasia Tuszynska <ktuszynska@esri.com> writes: > PG: 8.3.8 > 2010-08-18 20:29:42 PDT ERROR: out of memory > 2010-08-18 20:29:42 PDT DETAIL: Failed on request of size 134217728. What command is this failure occurring in? There aren't that many that would try to inhale your full work_mem in one gulp like that. The short answer here is probably that 32-bit Windows is not the platform to be trying to do memory-intensive work on, and you ought to get yourself a real operating system and/or a 64-bit machine. You could make it run within Windows' restrictions by cutting work_mem, maintenance_work_mem, and/or shared_buffers, but given that it's already taking a ridiculously long time, that doesn't seem like a path to a solution. Alternatively, you might be able to optimize the data load process a bit. Have you read http://www.postgresql.org/docs/8.3/static/populate.html ? I'm suspicious in particular that you might be trying to enforce foreign keys incrementally, which is usually a bad approach for large data loads. Also, I'd advise rethinking the choice of PG 8.3.x if possible. There are relevant performance improvements in 8.4.x. > Client: In house application, I checked to see if it is not doing anything silly, it seems to be doing everything as expected,indexes are created after a table has loaded, we commit every 1000 inserts... Committing after every few inserts is *not* an optimization for Postgres. Try kicking that number up a few orders of magnitude. regards, tom lane