Обсуждение: loading a lot of data on windows is very slow

Поиск

Список

Период

Сортировка

loading a lot of data on windows is very slow

От

Kasia Tuszynska

Дата:

25 августа 2010 г., 18:18:48

Hi Everybody,

Sorry for the vague post but I am looking for ideas in diagnosing a problem. I am wondering if I am missing something obvious for Postgres or maybe I failed to check something.

Does anyone have any suggestions?

Sincerely,

Kasia

OS: windows 2008, 32bit dual processor, 4 gigs of RAM

PG: 8.3.8

Problem: loading 16 gigs of data takes several days, postgres runs out of memory before it completes the load.

Client: In house application, I checked to see if it is not doing anything silly, it seems to be doing everything as expected, indexes are created after a table has loaded, we commit every 1000 inserts…

The client works with various databases if I load the same data on SQL Server it takes half a day.

Looking at Postgres memory consumption, I see that it is very conservative, there are about 8 postgres.exe processes running, the most memory a process takes is about: 17,000K, this is after 2.5 days of loading, so no memory leaks

I have tried loading it 3 times on Postgres, each time changing something,

I have tried:

- Altering the init parameters in postgresql.conf (several times)

- Giving the machine more RAM ( from 2 gigs to 4gigs)

Each time I try to load, it fails with an “out of memory” error, at a different point in the load:

2010-08-18 20:29:42 PDT ERROR: out of memory

2010-08-18 20:29:42 PDT DETAIL: Failed on request of size 134217728.

I am on my 4^th attempt:

When I look at the postgres log as it is loading data ( I am tracing with default settings, for fear of running out of disk space) I see that things seem to progressing very slowly, looking at the distinct timestamps in the log file created today I see:

2010-08-25 02:04:22 PDT STATEMENT:

2010-08-25 07:45:56 PDT STATEMENT:

2010-08-25 07:50:57 PDT STATEMENT:

2010-08-25 08:36:39 PDT STATEMENT:

In a database world, where we worry about milliseconds, this is glacially slow.

Contents of the log when it starts running out memory:

TopMemoryContext: 338776 total in 41 blocks; 63632 free (423 chunks); 275144 used

TopTransactionContext: 8192 total in 1 blocks; 7024 free (0 chunks); 1168 used

CurTransactionContext: 8192 total in 1 blocks; 8176 free (2 chunks); 16 used

CurTransactionContext: 8192 total in 1 blocks; 7136 free (1 chunks); 1056 used

…

Prepared Queries: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used

Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used

Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used

Record information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used

CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used

MbProcContext: 1024 total in 1 blocks; 928 free (7 chunks); 96 used

MessageContext: 8192 total in 1 blocks; 4880 free (0 chunks); 3312 used

smgr relation table: 24576 total in 2 blocks; 10920 free (4 chunks); 13656 used

TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used

Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used

PortalMemory: 8192 total in 1 blocks; 8040 free (1 chunks); 152 used

PortalHeapMemory: 7168 total in 3 blocks; 1288 free (5 chunks); 5880 used

ExecutorState: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used

ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used

TupleSort: 24600 total in 2 blocks; 7840 free (0 chunks); 16760 used

TupleSort: 176152600 total in 23 blocks; 8379728 free (16 chunks); 167772872 used

Relcache by OID: 24576 total in 2 blocks; 15584 free (3 chunks); 8992 used

CacheMemoryContext: 2240336 total in 22 blocks; 997144 free (459 chunks); 1243192 used

…

MdSmgr: 8192 total in 1 blocks; 1480 free (0 chunks); 6712 used

LOCALLOCK hash: 24576 total in 2 blocks; 12056 free (4 chunks); 12520 used

Rendezvous variable hash: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used

Timezones: 49432 total in 2 blocks; 5968 free (0 chunks); 43464 used

ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used

2010-08-18 20:29:42 PDT ERROR: out of memory

Altered init parameters in postgresql.conf, I arrived at this through trial and error:

shared_buffers = 256MB # min 128kB or max_connections*16kB

work_mem = 128MB # min 64kB

maintenance_work_mem = 256MB # min 1MB

max_fsm_pages = 600000 # min max_fsm_relations*16, 6 bytes each

checkpoint_segments = 256 # in logfile segments, min 1, 16MB each

effective_cache_size = 1GB

Re: loading a lot of data on windows is very slow

От

Tom Lane

Дата:

25 августа 2010 г., 18:37:56

Kasia Tuszynska <ktuszynska@esri.com> writes:
> PG: 8.3.8
> 2010-08-18 20:29:42 PDT ERROR:  out of memory
> 2010-08-18 20:29:42 PDT DETAIL:  Failed on request of size 134217728.

What command is this failure occurring in?  There aren't that many that
would try to inhale your full work_mem in one gulp like that.

The short answer here is probably that 32-bit Windows is not the
platform to be trying to do memory-intensive work on, and you ought to
get yourself a real operating system and/or a 64-bit machine.  You could
make it run within Windows' restrictions by cutting work_mem,
maintenance_work_mem, and/or shared_buffers, but given that it's already
taking a ridiculously long time, that doesn't seem like a path to a
solution.

Alternatively, you might be able to optimize the data load process a
bit.  Have you read
http://www.postgresql.org/docs/8.3/static/populate.html
?  I'm suspicious in particular that you might be trying to enforce
foreign keys incrementally, which is usually a bad approach for large
data loads.

Also, I'd advise rethinking the choice of PG 8.3.x if possible.
There are relevant performance improvements in 8.4.x.

> Client: In house application, I checked to see if it is not doing anything silly, it seems to be doing everything as
expected,indexes are created after a table has loaded, we commit every 1000 inserts... 

Committing after every few inserts is *not* an optimization for
Postgres.  Try kicking that number up a few orders of magnitude.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: loading a lot of data on windows is very slow

loading a lot of data on windows is very slow

Re: loading a lot of data on windows is very slow