Re: Backends dying due to memory exhaustion--I'm stonkered
От | Doug McNaught |
---|---|
Тема | Re: Backends dying due to memory exhaustion--I'm stonkered |
Дата | |
Msg-id | m3g0i5sugy.fsf@belphigor.mcnaught.org обсуждение исходный текст |
Ответ на | Backends dying due to memory exhaustion--I'm stonkered (Doug McNaught <doug@wireboard.com>) |
Ответы |
Re: Backends dying due to memory exhaustion--I'm stonkered
|
Список | pgsql-general |
Tom Lane <tgl@sss.pgh.pa.us> writes: > Doug McNaught <doug@wireboard.com> writes: > > The problem I'm having is that the backends will crash randomly, after > > the database has been up for a few days, with: > > FATAL 1: Memory exhausted in AllocSetAlloc() > > > The system has plenty of memory and swap, and under normal > > circumstances the backends take up 10-15 megabytes. If it's a > > runaway situation of some kind, it happens very fast, as I've even > > taken snapshots of the process table at 1 minute intervals, and they > > show no abnormality right up to the time of the crash. > > Hmm. That puts a damper on the idea that it's a memory leak --- doesn't > eliminate the theory entirely, however. The other likely theory is that > you've got a variable-size column value someplace whose size word has > been corrupted, so that it claims to be umpteen megabytes long. Any > attempt to copy such a value out of the tuple it's in will result in > an instant "out of memory" complaint. Hmm, very interesting. Does VARCHAR count as a variable-size column? One funny thing is that the nightly VACUUM doesn't always fail--the system will run smoothly for one to three days on average before a crash. > Is there any consistency about which table is being touched when the > failure occurs? It's not hard to isolate and delete a damaged tuple > once you know which table it's in, but if you've got a lot of tables > the initial search can be tedious. I'll check into this. Having just looked over my error logs, I see some suspects but nothing jumps out at me. Unfortunately, OpenACS has a boatload of tables, and there are 8 different instances, each with its own database. > One way to get more info is to tweak the code to abort() just before > it would normally report the out-of-memory error. Then you will get > a coredump and can learn something from the backtrace (don't forget > to compile with -g). That's a thought, and I will try it. I'm currently (as of yesterday's crash) running with -d 2 and output sent to a logfile. Is this debuglevel high enough to tell me which table contains the bad tuple, if that's indeed the problem? If I can't nail it down that way, how hard would it be to write a C program to scan all the tuples in a database looking for bogus size fields? -Doug
В списке pgsql-general по дате отправления: