Обсуждение: how is pitr replay interruption time determined?
Today I was running some tests on one of our warm standbys. The way this test works is I took a zfs snapshot of the filesystem, induce failover, of the pitr host, then log in and play around with the data. Once we're done with this, I rolled back the zfs snapshot, restart the standby instance, and let it start doing wal_replay again. All of this seems to work fine, but when I start the replay instance again I get the following line in my logfile: LOG: database system was interrupted while in recovery at log time 2007-07-30 19:17:37 EDT I am curious how this date is determined by postgres/pitr? The standby had processed wal_logs right through today before having taken the zfs snapshot, so I would think it would have been interrupted at a log time of sometime today. Also the initial replay setup occured sometime was long before 07-30, so that doesn't correspond to that date either, so I'm wondering what it is that determines the point in time that replay thinks it is interuppted at. FWIW, this does have practical implecations, like how many wal files to keep around, in my case I had all the files going back to that date, so I only had to wait for replay to catch up.... 5 hours later :-). But it would be good to be able to predetermine that kind of thing. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > LOG: database system was interrupted while in recovery at log time 2007-07-30 > 19:17:37 EDT > I am curious how this date is determined by postgres/pitr? It's the update timestamp in pg_control, which in essence is going to be the completion time of the last checkpoint. If the crash was during WAL recovery, as it seems to be, it'd be the time the last "restart" checkpoint was made. I'm not sure why the phrase "at log time" is used. It looks like whoever wrote the message thought that the timestamp would be one taken from the WAL data being replayed, but that's not what I see the code doing. regards, tom lane
On Monday 27 August 2007 19:17, Tom Lane wrote: > Robert Treat <xzilla@users.sourceforge.net> writes: > > LOG: database system was interrupted while in recovery at log time > > 2007-07-30 19:17:37 EDT > > > > I am curious how this date is determined by postgres/pitr? > > It's the update timestamp in pg_control, which in essence is going to be > the completion time of the last checkpoint. If the crash was during WAL > recovery, as it seems to be, it'd be the time the last "restart" > checkpoint was made. > Is there some way to force checkpoints on a db doing wal replay? Also, I would think at a minimum that the last "restart" checkpoint would be akin to when the db was last started, but looking at my replay instance I see: pg_control last modified: Mon Aug 27 12:12:55 2007 Time of latest checkpoint: Mon Jul 30 19:17:37 2007 The first date is around the time I restarted the instance, the second is still a mystery to me. Perhaps I am misinterpreting your respone? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > Is there some way to force checkpoints on a db doing wal replay? No, it's hardwired to do it when it sees a checkpoint record in the WAL stream. > pg_control last modified: Mon Aug 27 12:12:55 2007 > Time of latest checkpoint: Mon Jul 30 19:17:37 2007 After looking again at the code, the "last modified" time is the time that a recovery checkpoint was last done, and the "latest checkpoint" is the timestamp of the WAL-stream checkpoint record that triggered it. In a situation where you're catching up on historical WAL they could be far apart, but when a slave is just following the master there shouldn't be a huge difference --- not more than the maximum time to fill a WAL record and ship it over to the slave, for sure. (BTW, I misread it before --- it looks like the "at log time" value printed at startup *is* taken from the checkpoint record that it's trying to roll forward from.) Assuming that you're absorbing data from the master at a steady rate, the only reason I can see for the timestamps to be so old is if the "rm_safe_restartpoint" checks are always failing. I seem to remember that we found and fixed a bug that could cause something like that, but I can't find any trace of it in the CVS logs. Simon, do you recall such a problem post-8.2.0? regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes: > Yeh, we traced a problem with GIN indexes to this cause in early June; > Teodor fixed it quickly in REL8_2_STABLE, but that won't be available > until 8.2.5. Thanks, that's what I missed finding ... Robert, do you have any GIN indexes in this DB? > I'd be happier with a log message to say > ereport(DEBUG2, > (errmsg("RM %d not safe to record restart point at %X/%X", > rmid, > checkPoint->redo.xlogid, > checkPoint->redo.xrecoff))); > to help trace such things in future. As long as it's DEBUG2 or so, no objection here. regards, tom lane
On Tue, 2007-08-28 at 17:59 -0400, Tom Lane wrote: > Robert Treat <xzilla@users.sourceforge.net> writes: > > Is there some way to force checkpoints on a db doing wal replay? > > No, it's hardwired to do it when it sees a checkpoint record in the WAL stream. > > > pg_control last modified: Mon Aug 27 12:12:55 2007 > > Time of latest checkpoint: Mon Jul 30 19:17:37 2007 > > After looking again at the code, the "last modified" time is the time > that a recovery checkpoint was last done, and the "latest checkpoint" > is the timestamp of the WAL-stream checkpoint record that triggered it. > In a situation where you're catching up on historical WAL they could be > far apart, but when a slave is just following the master there shouldn't > be a huge difference --- not more than the maximum time to fill a WAL > record and ship it over to the slave, for sure. > > (BTW, I misread it before --- it looks like the "at log time" value > printed at startup *is* taken from the checkpoint record that it's > trying to roll forward from.) That's correct. Sorry for not replying earlier; just back from hols. Jumping back to original thought: Robert, you should be using the last checkpoint location, not the last time to decide which xlogs to remove. > Assuming that you're absorbing data from the master at a steady rate, > the only reason I can see for the timestamps to be so old is if the > "rm_safe_restartpoint" checks are always failing. I seem to remember > that we found and fixed a bug that could cause something like that, > but I can't find any trace of it in the CVS logs. Simon, do you > recall such a problem post-8.2.0? Yeh, we traced a problem with GIN indexes to this cause in early June; Teodor fixed it quickly in REL8_2_STABLE, but that won't be available until 8.2.5. I'd be happier with a log message to say ereport(DEBUG2, (errmsg("RM %d not safe to record restart point at %X/%X", rmid, checkPoint->redo.xlogid, checkPoint->redo.xrecoff))); to help trace such things in future. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: >> As long as it's DEBUG2 or so, no objection here. > OK, I'll do up a patch. No need, done already. regards, tom lane
On Tue, 2007-08-28 at 18:49 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Yeh, we traced a problem with GIN indexes to this cause in early June; > > Teodor fixed it quickly in REL8_2_STABLE, but that won't be available > > until 8.2.5. > > Thanks, that's what I missed finding ... Robert, do you have any GIN > indexes in this DB? > > > I'd be happier with a log message to say > > ereport(DEBUG2, > > (errmsg("RM %d not safe to record restart point at %X/%X", > > rmid, > > checkPoint->redo.xlogid, > > checkPoint->redo.xrecoff))); > > to help trace such things in future. > > As long as it's DEBUG2 or so, no objection here. OK, I'll do up a patch. References: - GIN error trace patch on pgsql-general http://archives.postgresql.org/pgsql-general/2007-06/msg00072.php - Committed fixes http://archives.postgresql.org/pgsql-committers/2007-06/msg00061.php -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Tuesday 28 August 2007 18:49, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Yeh, we traced a problem with GIN indexes to this cause in early June; > > Teodor fixed it quickly in REL8_2_STABLE, but that won't be available > > until 8.2.5. > > Thanks, that's what I missed finding ... Robert, do you have any GIN > indexes in this DB? > Ah, I didn't even think about this, but yes, one of the dbs in the cluster (a small, low traffic, and generally forgotten about one) does have a couple of GIN indexes in it. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Tuesday 28 August 2007 18:44, Simon Riggs wrote: > On Tue, 2007-08-28 at 17:59 -0400, Tom Lane wrote: > > Robert Treat <xzilla@users.sourceforge.net> writes: > > > Is there some way to force checkpoints on a db doing wal replay? > > > > No, it's hardwired to do it when it sees a checkpoint record in the WAL > > stream. > > > > > pg_control last modified: Mon Aug 27 12:12:55 2007 > > > Time of latest checkpoint: Mon Jul 30 19:17:37 2007 > > > > After looking again at the code, the "last modified" time is the time > > that a recovery checkpoint was last done, and the "latest checkpoint" > > is the timestamp of the WAL-stream checkpoint record that triggered it. > > In a situation where you're catching up on historical WAL they could be > > far apart, but when a slave is just following the master there shouldn't > > be a huge difference --- not more than the maximum time to fill a WAL > > record and ship it over to the slave, for sure. > > > > (BTW, I misread it before --- it looks like the "at log time" value > > printed at startup *is* taken from the checkpoint record that it's > > trying to roll forward from.) > > That's correct. Sorry for not replying earlier; just back from hols. > > Jumping back to original thought: Robert, you should be using the last > checkpoint location, not the last time to decide which xlogs to remove. > Looking at last checkpoint location in pg_control, I see: Latest checkpoint location: 1C/8001E848 How does one translate that into an xlog file name? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
>>> On Wed, Aug 29, 2007 at 9:48 AM, in message <200708291048.54662.xzilla@users.sourceforge.net>, Robert Treat <xzilla@users.sourceforge.net> wrote: > Looking at last checkpoint location in pg_control, I see: > Latest checkpoint location: 1C/8001E848 > > How does one translate that into an xlog file name? Unless you've gotten fancy in a PITR recovery, it should be: 000000010000001C00000080 01E848 is the offset into the file. -Kevin