Re: Server error
От | scott.marlowe |
---|---|
Тема | Re: Server error |
Дата | |
Msg-id | Pine.LNX.4.33.0305070916090.8765-100000@css120.ihs.com обсуждение исходный текст |
Ответ на | Re: Server error (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-general |
On Tue, 6 May 2003, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > On Tue, 6 May 2003, Erik Ronström wrote: > >> I have a plpgsql function which dies strangely very often, with the > >> message "server closed the connection unexpectedly". The log file says > > > Sig 11 means you have bad memory or CPU, about 99.9% of the time. > > In my part of the universe, about 99% of the time it means you've found > a software bug ;-) ... especially if you can create an example case that > is reproducible on another machine. Erik, can you wrap up a test case? > And which PG version are you running, anyway? Touche' I think the real issue is whether or not the error remains the same each time, occuring in the same exact place, then it is usually code. But if the sig 11 shows up in different places each time, then it is likely bad hardware. Further, just because one gets a sig11 every time they run a certain stored proc is not necessarily the same as getting one in the same exact place of the stored proc or postgresql code while it's running. So, it's a good idea to get several traces of the sig 11, and compare them. If they aren't happening in the same place each time, then the hardware should be checked. My point on this is that YOU shouldn't be chasing down these problems until such time as the user has proven that their hardware is sound. Since bad hardware is pretty common, and your time is a limited resource, I really feel that if someone is getting sig 11s, they should be directed to test their hardware first with something like memtest86 and only after it passes should they come back to you. Especially right now when you and the other developers are working hard to get the 7.4 code ready to go. The old test for bad hardware, by the way, was to compile the linux kernel a 100 times with a -j <bignum> switch with bignum set high enough to use all your memory. Of course, that was back when 64 megs was a fair bit, so it wasn't hard to get the machine to use it all. With bigger and bigger memory subsystems, bad memory is much more likely to stay hidden until load increases, then boom, you hit that bad bit and get a sig11. Hence the need for better hardware testing before chasing the software bug possibility.
В списке pgsql-general по дате отправления: