Обсуждение: starting postgres with an empty px_xlog folder
Hello,
We shut down our postgres 8.3 server last night cleanly for some hosted services maintenance. When we got our server back, it didnt have the pg_xlog mount with files and now when we start the server, it complains:
2012-06-23 06:06:04 CDT [18612]: [1-1] user=,db= LOG: database system was shut down at 2012-06-22 23:56:16 CDT
2012-06-23 06:06:04 CDT [18612]: [2-1] user=,db= LOG: could not open file "pg_xlog/0000000100000A0100000040" (log file 2561, segment 64): No such file or
directory
2012-06-23 06:06:04 CDT [18612]: [3-1] user=,db= LOG: invalid primary checkpoint record
2012-06-23 06:06:04 CDT [18612]: [4-1] user=,db= LOG: could not open file "pg_xlog/0000000100000A0100000040" (log file 2561, segment 64): No such file or
directory
2012-06-23 06:06:04 CDT [18612]: [5-1] user=,db= LOG: invalid secondary checkpoint record
2012-06-23 06:06:04 CDT [18612]: [6-1] user=,db= PANIC: could not locate a valid checkpoint record
2012-06-23 06:06:04 CDT [18609]: [1-1] user=,db= LOG: startup process (PID 18612) was terminated by signal 6: Aborted
2012-06-23 06:06:04 CDT [18609]: [2-1] user=,db= LOG: aborting startup due to startup process failure
Since we had a clean shut down is there an easy or relatively clean way to start back up? the pg_xlog folder is there, just with no files. i wouldnt think it would be needed for a clean restart.
I was able to duplicate the error in a test environment and get the database started back up wtih the pg_resetxlog $PGDATA command, but i am concerned about the fallout and not sure if i need to use any of the option switches. We also have archived log files up to the point of server shutdown if that provides a better option.
On Sat, Jun 23, 2012 at 7:01 AM, Mike Broers <mbroers@gmail.com> wrote:
Hello,We shut down our postgres 8.3 server last night cleanly for some hosted services maintenance. When we got our server back, it didnt have the pg_xlog mount with files and now when we start the server, it complains:2012-06-23 06:06:04 CDT [18612]: [1-1] user=,db= LOG: database system was shut down at 2012-06-22 23:56:16 CDT2012-06-23 06:06:04 CDT [18612]: [2-1] user=,db= LOG: could not open file "pg_xlog/0000000100000A0100000040" (log file 2561, segment 64): No such file ordirectory2012-06-23 06:06:04 CDT [18612]: [3-1] user=,db= LOG: invalid primary checkpoint record2012-06-23 06:06:04 CDT [18612]: [4-1] user=,db= LOG: could not open file "pg_xlog/0000000100000A0100000040" (log file 2561, segment 64): No such file ordirectory2012-06-23 06:06:04 CDT [18612]: [5-1] user=,db= LOG: invalid secondary checkpoint record2012-06-23 06:06:04 CDT [18612]: [6-1] user=,db= PANIC: could not locate a valid checkpoint record2012-06-23 06:06:04 CDT [18609]: [1-1] user=,db= LOG: startup process (PID 18612) was terminated by signal 6: Aborted2012-06-23 06:06:04 CDT [18609]: [2-1] user=,db= LOG: aborting startup due to startup process failureSince we had a clean shut down is there an easy or relatively clean way to start back up? the pg_xlog folder is there, just with no files. i wouldnt think it would be needed for a clean restart.
2012-06-23 06:06:04 CDT [18612]: [2-1] user=,db= LOG: could not open file "pg_xlog/0000000100000A0100000040" (log file 2561, segment 64): No such file or
directory
Don't do pg_resetlogs. If you have archives, please find for the file which has pointed in logs copy it to $PGDATA/pg_xlog location and start the server. Sometime, server will throw the sequence of files like 41,42...., not existing, you need to copy all of them to pg_xlog location and start it.
---
Regards,
Raghavendra
EnterpriseDB Corporation
On Sat, Jun 23, 2012 at 6:19 PM, Mike Broers <mbroers@gmail.com> wrote:
I was able to duplicate the error in a test environment and get the database started back up wtih the pg_resetxlog $PGDATA command, but i am concerned about the fallout and not sure if i need to use any of the option switches. We also have archived log files up to the point of server shutdown if that provides a better option.On Sat, Jun 23, 2012 at 7:01 AM, Mike Broers <mbroers@gmail.com> wrote:Hello,We shut down our postgres 8.3 server last night cleanly for some hosted services maintenance. When we got our server back, it didnt have the pg_xlog mount with files and now when we start the server, it complains:2012-06-23 06:06:04 CDT [18612]: [1-1] user=,db= LOG: database system was shut down at 2012-06-22 23:56:16 CDT2012-06-23 06:06:04 CDT [18612]: [3-1] user=,db= LOG: invalid primary checkpoint record2012-06-23 06:06:04 CDT [18612]: [4-1] user=,db= LOG: could not open file "pg_xlog/0000000100000A0100000040" (log file 2561, segment 64): No such file ordirectory2012-06-23 06:06:04 CDT [18612]: [5-1] user=,db= LOG: invalid secondary checkpoint record2012-06-23 06:06:04 CDT [18612]: [6-1] user=,db= PANIC: could not locate a valid checkpoint record2012-06-23 06:06:04 CDT [18609]: [1-1] user=,db= LOG: startup process (PID 18612) was terminated by signal 6: Aborted2012-06-23 06:06:04 CDT [18609]: [2-1] user=,db= LOG: aborting startup due to startup process failureSince we had a clean shut down is there an easy or relatively clean way to start back up? the pg_xlog folder is there, just with no files. i wouldnt think it would be needed for a clean restart.
On Sat, Jun 23, 2012 at 07:01:08AM -0500, Mike Broers wrote: > Hello, > > We shut down our postgres 8.3 server last night cleanly for some hosted > services maintenance. When we got our server back, it didnt have the > pg_xlog mount with files what happened to make the files not available? Seems like you may want to know the answer to this question for future reference.
Mike Broers wrote: > Mike Broers wrote: >> We shut down our postgres 8.3 server last night cleanly for some >> hosted services maintenance. When we got our server back, it didnt >> have the pg_xlog mount with files and now when we start the >> server, it complains >> Since we had a clean shut down is there an easy or relatively >> clean way to start back up? the pg_xlog folder is there, just with >> no files. i wouldnt think it would be needed for a clean restart. It is. > I was able to duplicate the error in a test environment and get the > database started back up wtih the pg_resetxlog $PGDATA command, but > i am concerned about the fallout and not sure if i need to use any > of the option switches. We also have archived log files up to the > point of server shutdown if that provides a better option. I would make a copy of the directory tree of the database cluster while the server is stopped before attempting any recovery. http://wiki.postgresql.org/wiki/Corruption The archived WAL files, may provide a better option that pg_resetxlog. If it were my data, I would be inclined to use pg_dump to create a fresh version of the database after recovery, although if you really have a full set of archived WAL files and the server starts up without any indications of problems in the server log, you might be OK without doing that. As previously stated, make sure you understand how this happened, so you can make sure it doesn't happen again. The contents of the pg_xlog directory are an integral part of your database cluster. -Kevin
On 06/24/2012 03:45 AM, Kevin Grittner wrote: > As previously stated, make sure you understand how this happened, so > you can make sure it doesn't happen again. The contents of the > pg_xlog directory are an integral part of your database cluster. People not backing up pg_xlog, deleting its contents, etc happens often enough that I wonder if it should have a prominent 00_README or 00_WARNING_DONT_DELETE file created by initdb - or simply be renamed to something scarier like "base_txrecords". The annoyance of a readme is that it'd need translation. People are used to logs being disposable. Anyone who's been responsible for a database should ideally know better than to assume that *transation* logs are disposable, but everyone has to learn sometime, and not everybody does so by reading TFM (unfortunately). -- Craig Ringer
On Sun, Jun 24, 2012 at 2:15 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: > On 06/24/2012 03:45 AM, Kevin Grittner wrote: >> >> As previously stated, make sure you understand how this happened, so >> you can make sure it doesn't happen again. The contents of the >> pg_xlog directory are an integral part of your database cluster. > > > People not backing up pg_xlog, deleting its contents, etc happens often > enough that I wonder if it should have a prominent 00_README or > 00_WARNING_DONT_DELETE file created by initdb - or simply be renamed to > something scarier like "base_txrecords". This doesn't help when your backup skips them because the files exist on another disk partition. Out of sight, out of mind. If pg_basebackup defaulted to --xlog and its use was encouraged these mistakes might be less common. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
Ultimately the hosting service restored the files that they had not brought over during their maintenance migration and we started up ok. So that was a relief.
We had archived log files but it did not appear that the archive destination was caught up with the xlog the cluster was complaining about.
Given that the database server was shut down cleanly, and all other data besides pg_xlog was available as expected (not corrupted), what would have been the problem with pg_resetxlogs?
On Sun, Jun 24, 2012 at 2:15 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 06/24/2012 03:45 AM, Kevin Grittner wrote:People not backing up pg_xlog, deleting its contents, etc happens often enough that I wonder if it should have a prominent 00_README or 00_WARNING_DONT_DELETE file created by initdb - or simply be renamed to something scarier like "base_txrecords".As previously stated, make sure you understand how this happened, so
you can make sure it doesn't happen again. The contents of the
pg_xlog directory are an integral part of your database cluster.
The annoyance of a readme is that it'd need translation.
People are used to logs being disposable. Anyone who's been responsible for a database should ideally know better than to assume that *transation* logs are disposable, but everyone has to learn sometime, and not everybody does so by reading TFM (unfortunately).
--
Craig Ringer
Mike Broers <mbroers@gmail.com> wrote: > Ultimately the hosting service restored the files that they had > not brought over during their maintenance migration and we started > up ok. So that was a relief. +1 > We had archived log files but it did not appear that the archive > destination was caught up with the xlog the cluster was > complaining about. > > Given that the database server was shut down cleanly, and all > other data besides pg_xlog was available as expected (not > corrupted), what would have been the problem with pg_resetxlogs? Frankly, the odds would have been pretty good that you would have come up without lost data or a corrupted database; but it's a matter of the degree of confidence in that. Startup and shutdown code, by its nature, is not exercised as heavily as most PostgreSQL code. Startup after using a data recovery utility is even less heavily exercised. Less frequently executed code is more likely to have subtle bugs which only show up in rare corner cases. I like to minimize my risk. -Kevin
On Mon, Jun 25, 2012 at 9:19 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Mike Broers <mbroers@gmail.com> wrote: > >> Ultimately the hosting service restored the files that they had >> not brought over during their maintenance migration and we started >> up ok. So that was a relief. > > +1 > >> We had archived log files but it did not appear that the archive >> destination was caught up with the xlog the cluster was >> complaining about. >> >> Given that the database server was shut down cleanly, and all >> other data besides pg_xlog was available as expected (not >> corrupted), what would have been the problem with pg_resetxlogs? > > Frankly, the odds would have been pretty good that you would have > come up without lost data or a corrupted database; but it's a matter > of the degree of confidence in that. Startup and shutdown code, by > its nature, is not exercised as heavily as most PostgreSQL code. > Startup after using a data recovery utility is even less heavily > exercised. Less frequently executed code is more likely to have > subtle bugs which only show up in rare corner cases. I like to > minimize my risk. > > -Kevin Agreed 100%. Also, this brings up the question of how your server setup is documented by the hosting providor. If you've got remote drives, virtual drives or whatever, if needs to be documented there in such a way they can move you without losing things. Or you'll have to remind them each time you have a separate mount point that needs to get transferred, but that's error prone.