Обсуждение: Help! PostgreSQL stuck at starting up after crash
Thanks, Rick.
I think we learned the lesson not to do kill -9. One weird thing was that the three stuck automatically vacuum processes had been running for probably more than a day and they were all working on the same pretty static database. I doubt there were something wrong with our db. However it would be nice postgtrsql would be tolerant to that.
The symptoms we experienced was postgresql was extremely slow and failed all the db tests. The slowness came from the 3 vacuum process which use up all the cpu.
Apparently killing is not the right solution. Is a proper/safer way to recover the performance of postgresql?
On Wed, Jan 18, 2012 at 7:33 PM, Samuel Hwang <samuel@replicon.com> wrote:While I cannot help with restarting, you shouldn't ever use kill -9
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and it
> tried to restart after the crash
unless no other kill signal works.
kill default signal is SIGTERM (15), refer to man 7 signal (in linux).
vacuum processes can be set on a per table basis, perhaps you need to
closely examine that part of configuration.
Sorry I can't help further.
--
aRDy Music and Rick Dicaire present:
http://www.ardynet.com
http://www.ardynet.com:9000/ardymusic.ogg.m3u
Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046
Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us
We are hiring! | search jobs
Sounds like you have a corrupt wal files that you will have to reset the wal logs with pgresetxlog. http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html This will result in missing transactions so before you do this shutdown postgres and make a copy of the database files first. That way if you don't like what happens you can always go back to the way things were. Also right now would be a good time to evaluate your backup strategy, which is a different topic for a different thread, but I can certainly help with that as well. -David Hornsby > version Postgresql 9.1.1 on centos5 x64 > > We experience slow performance and found the server is running 3 vacuum > process on the same db which use up 99% of CPU. > Then we kill -9 one of those process which cause postgresql to crash and > it > tried to restart after the crash > However when the starting process reach the last WAL files, it just stuck > there > > pg_controldata shows the db is in Archive Recovery mode and when using > psql > to connect the db, it says FATAL: the database system starting up. > > I took a chance and upgrade to PostgreSql 9.1.2 and see if anything > changed > it still stuck at the end of recovery. > pg_controldata shows db is in Crash recovery, but that probably different > wording I think > using psql to connect the db, it says FATAL: the database system is > starting up. > > I pretty much run out of idea here. > Can anyone help what to go from here? > > Samuel >
Sounds like you have a corrupt wal files that you will have to reset the
wal logs with pgresetxlog.
http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html
This will result in missing transactions so before you do this shutdown
postgres and make a copy of the database files first. That way if you
don't like what happens you can always go back to the way things were.
Also right now would be a good time to evaluate your backup strategy,
which is a different topic for a different thread, but I can certainly
help with that as well.
-David Hornsby
> version Postgresql 9.1.1 on centos5 x64
>
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and
> it
> tried to restart after the crash
> However when the starting process reach the last WAL files, it just stuck
> there
>
> pg_controldata shows the db is in Archive Recovery mode and when using
> psql
> to connect the db, it says FATAL: the database system starting up.
>
> I took a chance and upgrade to PostgreSql 9.1.2 and see if anything
> changed
> it still stuck at the end of recovery.
> pg_controldata shows db is in Crash recovery, but that probably different
> wording I think
> using psql to connect the db, it says FATAL: the database system is
> starting up.
>
> I pretty much run out of idea here.
> Can anyone help what to go from here?
>
> Samuel
>
Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046
Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us
We are hiring! | search jobs
Samuel Hwang <samuel@replicon.com> wrote: > I don't know how to make sure if WAL logs corrupted. > At the end of the recovery in postgresql log I saw > > 2012-01-18 18:30:58.570 MST 3666 - LOG: consistent recovery > state reached at 56C/CD0AFE00 > 2012-01-18 18:30:58.587 MST 3666 - LOG: recovery stopping > before abort of transaction 541802043, time 2012-01-18 > 12:50:08.531615-07 > 2012-01-18 18:30:58.587 MST 3666 - LOG: redo done at > 56C/CD226C58 > 2012-01-18 18:30:58.587 MST 3666 - LOG: last completed > transaction was at log time 2012-01-18 12:49:28.321605-07 > 2012-01-18 18:30:58.589 MST 3666 - LOG: selected new timeline > ID: 2 > 2012-01-18 18:30:59.187 MST 3666 - LOG: archive recovery > complete > > just nothing happened after that and postgresql is stuck at > starting up and not getting out of archive recovery mode.\ What do you base that on? (copy/paste) > at that time there is no cpu/disk activities and it seemed like > it's waiting for something? That looks like normal recovery. I would expect it to be waiting for clients to connect at that point. What happens when you try to connect to the database after that above has been logged? (Copy/paste the psql command line and any errors, please.) -Kevin
What do you base that on? (copy/paste)
> it's waiting for something?
for clients to connect at that point.
What happens when you try to connect to the database after that
above has been logged? (Copy/paste the psql command line and any
errors, please.)
Samuel Hwang <samuel@replicon.com> wrote:> starting up and not getting out of archive recovery mode.\
> I don't know how to make sure if WAL logs corrupted.
> At the end of the recovery in postgresql log I saw
>
> 2012-01-18 18:30:58.570 MST 3666 - LOG: consistent recovery
> state reached at 56C/CD0AFE00
> 2012-01-18 18:30:58.587 MST 3666 - LOG: recovery stopping
> before abort of transaction 541802043, time 2012-01-18
> 12:50:08.531615-07
> 2012-01-18 18:30:58.587 MST 3666 - LOG: redo done at
> 56C/CD226C58
> 2012-01-18 18:30:58.587 MST 3666 - LOG: last completed
> transaction was at log time 2012-01-18 12:49:28.321605-07
> 2012-01-18 18:30:58.589 MST 3666 - LOG: selected new timeline
> ID: 2
> 2012-01-18 18:30:59.187 MST 3666 - LOG: archive recovery
> complete
>
> just nothing happened after that and postgresql is stuck at
What do you base that on? (copy/paste)That looks like normal recovery. I would expect it to be waiting
> at that time there is no cpu/disk activities and it seemed like
> it's waiting for something?
for clients to connect at that point.
What happens when you try to connect to the database after that
above has been logged? (Copy/paste the psql command line and any
errors, please.)
-Kevin
Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046
Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us
We are hiring! | search jobs
Sounds like you have a corrupt wal files that you will have to reset the
wal logs with pgresetxlog.
http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html
This will result in missing transactions so before you do this shutdown
postgres and make a copy of the database files first. That way if you
don't like what happens you can always go back to the way things were.
Also right now would be a good time to evaluate your backup strategy,
which is a different topic for a different thread, but I can certainly
help with that as well.
-David Hornsby
> version Postgresql 9.1.1 on centos5 x64
>
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and
> it
> tried to restart after the crash
> However when the starting process reach the last WAL files, it just stuck
> there
>
> pg_controldata shows the db is in Archive Recovery mode and when using
> psql
> to connect the db, it says FATAL: the database system starting up.
>
> I took a chance and upgrade to PostgreSql 9.1.2 and see if anything
> changed
> it still stuck at the end of recovery.
> pg_controldata shows db is in Crash recovery, but that probably different
> wording I think
> using psql to connect the db, it says FATAL: the database system is
> starting up.
>
> I pretty much run out of idea here.
> Can anyone help what to go from here?
>
> Samuel
>
Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046
Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us
We are hiring! | search jobs
pg_resetxlog does the trick and db can be started and readable.I am dumping the data out and import to a newly created database cluster.We pretty much lost the data for the last two days, but since our postgresql were running well, it is fewer than it looks.Thanks a lot for the help.On Thu, Jan 19, 2012 at 7:30 AM, David Hornsby <david@beechglen.com> wrote:Sounds like you have a corrupt wal files that you will have to reset the
wal logs with pgresetxlog.
http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html
This will result in missing transactions so before you do this shutdown
postgres and make a copy of the database files first. That way if you
don't like what happens you can always go back to the way things were.
Also right now would be a good time to evaluate your backup strategy,
which is a different topic for a different thread, but I can certainly
help with that as well.
-David Hornsby
> version Postgresql 9.1.1 on centos5 x64
>
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and
> it
> tried to restart after the crash
> However when the starting process reach the last WAL files, it just stuck
> there
>
> pg_controldata shows the db is in Archive Recovery mode and when using
> psql
> to connect the db, it says FATAL: the database system starting up.
>
> I took a chance and upgrade to PostgreSql 9.1.2 and see if anything
> changed
> it still stuck at the end of recovery.
> pg_controldata shows db is in Crash recovery, but that probably different
> wording I think
> using psql to connect the db, it says FATAL: the database system is
> starting up.
>
> I pretty much run out of idea here.
> Can anyone help what to go from here?
>
> Samuel
>--Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046
Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact usWe are hiring! | search jobs
Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046
Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us
We are hiring! | search jobs
-David Hornsby
On 1/19/2012 5:46 PM, Samuel Hwang wrote:
correct typo.We pretty much lost the data for the last two days, but since our postgresql wereN'T running well, it is fewer than it looks.On Thu, Jan 19, 2012 at 3:45 PM, Samuel Hwang <samuel@replicon.com> wrote:pg_resetxlog does the trick and db can be started and readable.I am dumping the data out and import to a newly created database cluster.We pretty much lost the data for the last two days, but since our postgresql were running well, it is fewer than it looks.Thanks a lot for the help.On Thu, Jan 19, 2012 at 7:30 AM, David Hornsby <david@beechglen.com> wrote:Sounds like you have a corrupt wal files that you will have to reset the
wal logs with pgresetxlog.
http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html
This will result in missing transactions so before you do this shutdown
postgres and make a copy of the database files first. That way if you
don't like what happens you can always go back to the way things were.
Also right now would be a good time to evaluate your backup strategy,
which is a different topic for a different thread, but I can certainly
help with that as well.
-David Hornsby
> version Postgresql 9.1.1 on centos5 x64
>
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and
> it
> tried to restart after the crash
> However when the starting process reach the last WAL files, it just stuck
> there
>
> pg_controldata shows the db is in Archive Recovery mode and when using
> psql
> to connect the db, it says FATAL: the database system starting up.
>
> I took a chance and upgrade to PostgreSql 9.1.2 and see if anything
> changed
> it still stuck at the end of recovery.
> pg_controldata shows db is in Crash recovery, but that probably different
> wording I think
> using psql to connect the db, it says FATAL: the database system is
> starting up.
>
> I pretty much run out of idea here.
> Can anyone help what to go from here?
>
> Samuel
>--Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046
Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact usWe are hiring! | search jobs
--Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046
Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact usWe are hiring! | search jobs
-- David Hornsby Beechglen Development Inc. P: (513) 922 - 0509 x432 C: (513) 254 - 0605 F: (513) 347 - 2834 W: beechglen.com