Howdy:
I have a Linux server running PostgreSQL 7.2.1. with about 1 Gig of
memory. The proc speed is about 1.14 GHz.
I'm getting more and more concerned about how often
the database gets used and the days (like today) where I
wonder if buying more memory rather than buying a 2nd CPU
was such a great idea.
In an effort to enhance / streamline performance, I've done
the following:
* memory upgrade from 512M to 1G
* move RAID5 to scsi drives (10K RPM)
* set up cron script to vacuum database weekly
* set number of client connects ( i.e., /usr/bin/postmaster -i -B 128 -N 64 -d 4 )
But I'm at the point now that I can't kill some jobs. Yes, I know
I shouldn't use 'kill' in any forceful way, but just a kill seems to do
nothing (or, if it is doing something, it's not fast enough
for the user community and it's stopping production).
I got this from a co-worker:
[snip]
the server is blocking on access to the metadata tables. not even
logins are being processed. I'm not sure what caused the
problem, but I think a database restart is the best course,
which I have been trying to do. Do not kill -9, as it will corrupt the WAL.
[/snip]
I see some errors in the messages file regarding the RAID drives
(the filesystem where the database lives)
[snip error]
Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 9f 00 00 40 00
Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 df 00 00 08 00
Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 ef 00 00 08 00
Nov 18 12:33:52 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 77 27 00 00 40 00
Nov 18 12:33:52 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 77 67 00 00 40 00
Nov 18 12:33:53 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
[/snip error]
my questions are:
* have can I figure out how access to the metadata tables
are being stopped? (my guess is the error on the scsi drive, but ... )
* how to restart PostgreSQL without running the risk of corrupting data?
* what are the benefits to adding a 2nd CPU over, say, more memory?
I'm sorry that I don't have enough information at this time ... I'm getting
swamped by users as I type this.
Thanks!
-X