Обсуждение: Severe Badness On My Server: psql: FATAL: the database system is starting up

Поиск
Список
Период
Сортировка

Severe Badness On My Server: psql: FATAL: the database system is starting up

От
Mitchell Laks
Дата:
Dear Gurus:
My Server and me have had a very bad weekend, starting Friday afternoon.

I am running Debian Sarge, Postgresql 7.4.6 with linux kernel 2.6.8.

I am running a Postgresql backed application on a remote server. The system
has a system drive, on which the Postgresql database runs and there is a raid
1 drive on which the application stores data.

Well, the raid1 failed (or is failing - or is trying its hardest to fail, not
clear yet...). This should not have affected the Postgresql database as it is
safely on a separate drive.

However, when i logged onto the system, I found that I could not turn off
postgresql. I logged in as postgres, did pg_ctl stop and it did ....... and
then could not stop (presumably because hanging client applications were not
loged off the database).

So then I killed all the application clients (kill -9 of them), and still I
tried to pg_ctl stop and it did not want to stop.

So I looked in ps aux and the client applications looked like they were in D
status in ps aux.

wustl    18232  0.0  0.2  4872 1920 ?        D    Mar11
0:00 /usr/local/ctn/bi


I then tried to reboot system remotely via login as root and shutdown -r now
and even shutdown -h now. Interestingly enough (I have never ever seen this -
system refused to shutdown!!!!!!!).

I was floored! Well what to do? I decided to sleep on it.

Well I logged in then on saturday night and system was still hanging in this
bizarre state. I now saw qued shutdown requests in the ps aux. And nothing
was happening fast.

I thought. I read a little. I tried pg_ctl stop -m fast. It did  nothing. I
prayed. I tried to do pg_dump LTA_IDB >lta_idb.dump to dump the database in
question. It didnt do anything.

I was desparate. I decided to try desparate measures I then pulled the gun

pg_ctl stop -m   i.

OK so it stopped. Then I said let me try to dump the database and so I did
pg_ctl start. It started

postgres@A1:~$ pg_ctl status
pg_ctl: postmaster is running (PID: 21195)
Command line was:
/usr/lib/postgresql/bin/postmaster

Then I tried to dump the database and i got some message about the fact that
Fatal the database was starting. I waited a while and then I tried again.
same message. I then tried as user of the database psql LTA_IDB and message
Fatal the database is starting.

Then I tried psql LTA_IDB and got Fatal database is starting.

I waited. Then I did pg_ctl stop (I dont know why i did it. Perversity I
think.)

It then said to me
................ something about unable to stop.

Then I did

postgres@A1:~$ pg_dump LTA_IDB>lta_idb.dump
2005-03-13 10:56:33 [21481] LOG:  connection received: host=[local] port=
2005-03-13 10:56:33 [21481] FATAL:  the database system is shutting down
pg_dump: [archiver (db)] connection to database "LTA_IDB" failed: FATAL:  the
dn

Now I did
pg_ctl status
postgres@A1:~$ pg_ctl status
pg_ctl: postmaster is running (PID: 21195)
Command line was:
/usr/lib/postgresql/bin/postmaster

OK I feel like I am in the twilight zone.

Next I did as root
cd /var/log
ls postg*

A1:/var/log# ls post*
postgres.log        postgres.log.2.gz  postgres.log.5.gz  postgres.log.8.gz
postgres.log.1      postgres.log.3.gz  postgres.log.6.gz  postgres.log.9.gz
postgres.log.10.gz  postgres.log.4.gz  postgres.log.7.gz
A1:/var/log# less postgres.log
postgres.log: No such file or directory

WHAT????????
df -h
/dev/sda2             9.2G  2.8G  6.0G  32% /
tmpfs                 443M     0  443M   0% /dev/shm
/dev/sda1              89M   11M   74M  13% /boot
/dev/sda3             7.4G  273M  6.7G   4% /home
/dev/sda8              11G   33M  9.9G   1% /mirror
/dev/sda7             449M  8.1M  417M   2% /tmp
/dev/sda6             7.4G  4.7G  2.4G  67% /var
/dev/md0              230G  139G   80G  64% /home/big0

I am in the twilight zone. My sanity is suspect. Any ideas on what to do next?
Pull the plug????
Mitchell

Re: Severe Badness On My Server: psql: FATAL: the database system is starting up

От
Tom Lane
Дата:
Mitchell Laks <mlaks@verizon.net> writes:
> Well, the raid1 failed (or is failing - or is trying its hardest to fail, not
> clear yet...). This should not have affected the Postgresql database as it is
> safely on a separate drive.

Try turning off the power, physically disconnecting the raid1, and rebooting.
It sounds to me like the raid drive is so wedged that the kernel is
getting confused (or at least hanging operations that theoretically
shouldn't hang).

            regards, tom lane