Re: Fix calculations on WAL recycling.

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Fix calculations on WAL recycling.
Дата
Msg-id 20180723.195557.114019130.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Fix calculations on WAL recycling.  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Fix calculations on WAL recycling.  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
At Mon, 23 Jul 2018 15:59:16 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180723065916.GI2854@paquier.xyz>
> On Mon, Jul 23, 2018 at 01:57:48PM +0900, Kyotaro HORIGUCHI wrote:
> > I'll register this to the next commitfest.
> > 
> > https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyotaro@lab.ntt.co.jp
> 
> This is an open item for v11.

Mmm. Thanks. I wrongly thought this was v10 item. Removed this
from the next CF.

Thank you for taking a look.

> >> While considering this, I found a bug in 4b0d28de06, which
> >> removed prior checkpoint from control file. It actually trims the
> >> segments before the last checkpoint's redo segment but recycling
> >> is still considered based on the *prevous* checkpoint. As the
> >> result min_wal_size doesn't work as told.  Specifically, setting
> >> min/max_wal_size to 48MB and advance four or more segments then
> >> two checkpoints leaves just one segment, which is less than
> >> min_wal_size.
> >> 
> >> The attached patch fixes that. One arguable point on this would
> >> be the removal of the behavior when RemoveXLogFile(name,
> >> InvalidXLogRecPtr, ..).
> >> 
> >> The only place calling the function with the parameter is
> >> timeline switching. Previously unconditionally 10 segments are
> >> recycled after switchpoint but the reason for the behavior is we
> >> didn't have the information on previous checkpoint at hand at the
> >> time. But now we can use the timeline switch point as the
> >> approximate of the last checkpoint's redo point and this allows
> >> us to use min/max_wal_size properly at the time.
> 
> I have been looking at that, and tested with this simple scenario:
> create table aa (a int);
> 
> Then just repeat the following SQLs: 
> insert into aa values (1);
> select pg_switch_wal();
> checkpoint;
> select pg_walfile_name(pg_current_wal_lsn());
> 
> By doing so, you would notice that the oldest WAL segment does not get
> recycled after the checkpoint, while it should as the redo pointer is
> always checkpoint generated.  What happens is that this oldest segment
> gets recycled every two checkpoints.

(I'm not sure I'm getting it correctly..) I see the old segments
recycled. When I see pg_switch_wal() returns 0/12..,
pg_walfile_name is ...13 and segment files for 13 and 14 are
found in pg_wal directory. That is, seg 12 was recycled as seg
14. log_checkpoint=on shows every checkpoint recycles 1 segment
in the case.

> Then I had a look at the proposed patch, with a couple of comments.
> 
> -   if (PriorRedoPtr == InvalidXLogRecPtr)
> -       recycleSegNo = endlogSegNo + 10;
> -   else
> -       recycleSegNo = XLOGfileslop(PriorRedoPtr);
> +   recycleSegNo = XLOGfileslop(RedoRecPtr);
> I think that this is a new behavior, and should not be changed, see
> point 3 below.
> 
> In CreateCheckPoint(), the removal of past WAL segments is always going
> to happen as RedoRecPtr is never InvalidXLogRecPtr, so the recycling
> should happen also when PriorRedoPtr is InvalidXLogRecPtr, no?

Yes. The reason for the change was the change of
RemoveNonParentXlogFiles that I made and it is irrelevant to this
bug fix (and rather breaking as you pointed..). So, reverted it.


> /* Trim from the last checkpoint, not the last - 1 */
> This comment could be adjusted, let's remove "not the last - 1" .

Oops! Thanks. The comment has finally vanished and melded into
another comment just above.

| * Delete old log files not required by the last checkpoint and recycle
| * them


> The calculation of _logSegNo in CreateRestartPoint is visibly incorrect,
> this still uses PriorRedoPtr so the bug is not fixed for standbys.  The
> tweaks for ThisTimeLineID also need to be out of the loop where
> PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation
> should be kept.

Agreed.  While I reconsidered this, I noticed that the estimated
checkpoint distance is 0 when PriorRedoPtr is invalid. This means
that the first checkpoint/restartpoint tries to reduce WAL
segments to min_wal_size. However, it happens only at initdb time
and makes no substantial behavior change so I made the change
ignoring the difference.

> Finally, in summary, this patch is doing actually three things:
> 1) Rename a couple of variables which refer incorrectly to the prior
> checkpoint so as they refer to the last checkpoint generated.
> 2) Make sure that WAL recycling/removal happens based on the last
> checkpoint generated, which is the one just generated when past WAL
> segments are cleaned up as post-operation actions.
> 3) Enforce the recycling point to be the switch point instead of
> arbitrarily recycle 10 segments, which is what b2a5545b has introduced.
> Recycling at exactly the switch point is wrong, as the redo LSN of the
> previous checkpoint may not be at the same segment when a timeline has
> switched, so you would finish with recycling segments which are still
> needed if an instance needs be crash-recovered post-promotion without
> a completed post-recovery checkpoint.  In short this is dangerous.
> I'll let Heikki comment on that, but I think that you should fetch the
> last redo LSN instead, still you need to be worried about checkpoints
> running in parallel of the startup process, so I would stick with the
> current logic.

Thank you for the detail. I was coufused a bit there. I agree to
preserve the logic, too.

> 1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is
> not.

Thank you for the comments and suggestions. After all, I did the
following things in the attached patch.

- Reverted the change on timeline switching. (Removed the (3))

- Fixed CreateRestartPoint to do the same thing with CreateCheckPoint.

- Both CreateRestart/CheckPoint now tries trimming of WAL
  segments even for the first time.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 662ecab9c2d0efc41756d98e18dbabe060da8d96 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 19 Jul 2018 12:13:56 +0900
Subject: [PATCH] Fix calculation base of WAL recycling

The commit 4b0d28de06 removed the prior checkpoint and related things
but that leaves WAL recycling based on the prior checkpoint. This
makes max_wal_size and min_wal_size work incorrectly. This patch makes
WAL recycling be based on the last checkpoint.
---
 src/backend/access/transam/xlog.c | 159 ++++++++++++++++++--------------------
 1 file changed, 75 insertions(+), 84 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 335b4a573d..05f750253f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2287,7 +2287,7 @@ assign_checkpoint_completion_target(double newval, void *extra)
  * XLOG segments? Returns the highest segment that should be preallocated.
  */
 static XLogSegNo
-XLOGfileslop(XLogRecPtr PriorRedoPtr)
+XLOGfileslop(XLogRecPtr RedoRecPtr)
 {
     XLogSegNo    minSegNo;
     XLogSegNo    maxSegNo;
@@ -2299,9 +2299,9 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
      * correspond to. Always recycle enough segments to meet the minimum, and
      * remove enough segments to stay below the maximum.
      */
-    minSegNo = PriorRedoPtr / wal_segment_size +
+    minSegNo = RedoRecPtr / wal_segment_size +
         ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1;
-    maxSegNo = PriorRedoPtr / wal_segment_size +
+    maxSegNo = RedoRecPtr / wal_segment_size +
         ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1;
 
     /*
@@ -2316,7 +2316,7 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
     /* add 10% for good measure. */
     distance *= 1.10;
 
-    recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) /
+    recycleSegNo = (XLogSegNo) ceil(((double) RedoRecPtr + distance) /
                                     wal_segment_size);
 
     if (recycleSegNo < minSegNo)
@@ -3899,12 +3899,12 @@ RemoveTempXlogFiles(void)
 /*
  * Recycle or remove all log files older or equal to passed segno.
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * redo pointer of the last checkpoint. These are used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
  */
 static void
-RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 {
     DIR           *xldir;
     struct dirent *xlde;
@@ -3947,7 +3947,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
                 /* Update the last removed location in shared memory first */
                 UpdateLastRemovedPtr(xlde->d_name);
 
-                RemoveXlogFile(xlde->d_name, PriorRedoPtr, endptr);
+                RemoveXlogFile(xlde->d_name, RedoRecPtr, endptr);
             }
         }
     }
@@ -4021,14 +4021,14 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 /*
  * Recycle or remove a log file that's no longer needed.
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * redo pointer of the last checkpoint. These are used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
- * If PriorRedoRecPtr is not known, pass invalid, and the function will
- * recycle, somewhat arbitrarily, 10 future segments.
+ * If RedoRecPtr is not known, pass invalid, and the function will recycle,
+ * somewhat arbitrarily, 10 future segments.
  */
 static void
-RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveXlogFile(const char *segname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 {
     char        path[MAXPGPATH];
 #ifdef WIN32
@@ -4042,10 +4042,10 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
      * Initialize info about where to try to recycle to.
      */
     XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
-    if (PriorRedoPtr == InvalidXLogRecPtr)
+    if (RedoRecPtr == InvalidXLogRecPtr)
         recycleSegNo = endlogSegNo + 10;
     else
-        recycleSegNo = XLOGfileslop(PriorRedoPtr);
+        recycleSegNo = XLOGfileslop(RedoRecPtr);
 
     snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);
 
@@ -8706,6 +8706,7 @@ CreateCheckPoint(int flags)
     bool        shutdown;
     CheckPoint    checkPoint;
     XLogRecPtr    recptr;
+    XLogSegNo    _logSegNo;
     XLogCtlInsert *Insert = &XLogCtl->Insert;
     uint32        freespace;
     XLogRecPtr    PriorRedoPtr;
@@ -9072,22 +9073,18 @@ CreateCheckPoint(int flags)
      */
     smgrpostckpt();
 
-    /*
-     * Delete old log files and recycle them
-     */
+    /* Update the average distance between checkpoints/restartpoints. */
     if (PriorRedoPtr != InvalidXLogRecPtr)
-    {
-        XLogSegNo    _logSegNo;
-
-        /* Update the average distance between checkpoints. */
         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
 
-        /* Trim from the last checkpoint, not the last - 1 */
-        XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
-        KeepLogSeg(recptr, &_logSegNo);
-        _logSegNo--;
-        RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, recptr);
-    }
+    /*
+     * Delete old log files (those no longer needed for last checkpoint to
+     * prevent the disk holding the xlog from growing full.
+     */
+    XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+    KeepLogSeg(recptr, &_logSegNo);
+    _logSegNo--;
+    RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
     /*
      * Make more log segments if needed.  (Do this after recycling old log
@@ -9253,6 +9250,11 @@ CreateRestartPoint(int flags)
     XLogRecPtr    lastCheckPointEndPtr;
     CheckPoint    lastCheckPoint;
     XLogRecPtr    PriorRedoPtr;
+    XLogRecPtr    receivePtr;
+    XLogRecPtr    replayPtr;
+    TimeLineID    replayTLI;
+    XLogRecPtr    endptr;
+    XLogSegNo    _logSegNo;
     TimestampTz xtime;
 
     /*
@@ -9394,69 +9396,58 @@ CreateRestartPoint(int flags)
     }
     LWLockRelease(ControlFileLock);
 
-    /*
-     * Delete old log files (those no longer needed even for previous
-     * checkpoint/restartpoint) to prevent the disk holding the xlog from
-     * growing full.
-     */
+    /* Update the average distance between checkpoints/restartpoints. */
     if (PriorRedoPtr != InvalidXLogRecPtr)
-    {
-        XLogRecPtr    receivePtr;
-        XLogRecPtr    replayPtr;
-        TimeLineID    replayTLI;
-        XLogRecPtr    endptr;
-        XLogSegNo    _logSegNo;
-
-        /* Update the average distance between checkpoints/restartpoints. */
         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
 
-        XLByteToSeg(PriorRedoPtr, _logSegNo, wal_segment_size);
+    /*
+     * Delete old log files (those no longer needed for last restartpoint to
+     * prevent the disk holding the xlog from growing full.
+     */
+    XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
 
-        /*
-         * Get the current end of xlog replayed or received, whichever is
-         * later.
-         */
-        receivePtr = GetWalRcvWriteRecPtr(NULL, NULL);
-        replayPtr = GetXLogReplayRecPtr(&replayTLI);
-        endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
+    /*
+     * Retreat _logSegNo using the current end of xlog replayed or received,
+     * whichever is later.
+     */
+    receivePtr = GetWalRcvWriteRecPtr(NULL, NULL);
+    replayPtr = GetXLogReplayRecPtr(&replayTLI);
+    endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
+    KeepLogSeg(endptr, &_logSegNo);
+    _logSegNo--;
+    
+    /*
+     * Try to recycle segments on a useful timeline. If we've been promoted
+     * since the beginning of this restartpoint, use the new timeline chosen
+     * at end of recovery (RecoveryInProgress() sets ThisTimeLineID in that
+     * case). If we're still in recovery, use the timeline we're currently
+     * replaying.
+     *
+     * There is no guarantee that the WAL segments will be useful on the
+     * current timeline; if recovery proceeds to a new timeline right after
+     * this, the pre-allocated WAL segments on this timeline will not be used,
+     * and will go wasted until recycled on the next restartpoint. We'll live
+     * with that.
+     */
+    if (RecoveryInProgress())
+        ThisTimeLineID = replayTLI;
 
-        KeepLogSeg(endptr, &_logSegNo);
-        _logSegNo--;
+    RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr);
 
-        /*
-         * Try to recycle segments on a useful timeline. If we've been
-         * promoted since the beginning of this restartpoint, use the new
-         * timeline chosen at end of recovery (RecoveryInProgress() sets
-         * ThisTimeLineID in that case). If we're still in recovery, use the
-         * timeline we're currently replaying.
-         *
-         * There is no guarantee that the WAL segments will be useful on the
-         * current timeline; if recovery proceeds to a new timeline right
-         * after this, the pre-allocated WAL segments on this timeline will
-         * not be used, and will go wasted until recycled on the next
-         * restartpoint. We'll live with that.
-         */
-        if (RecoveryInProgress())
-            ThisTimeLineID = replayTLI;
+    /*
+     * Make more log segments if needed.  (Do this after recycling old log
+     * segments, since that may supply some of the needed files.)
+     */
+    PreallocXlogFiles(endptr);
 
-        RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, endptr);
-
-        /*
-         * Make more log segments if needed.  (Do this after recycling old log
-         * segments, since that may supply some of the needed files.)
-         */
-        PreallocXlogFiles(endptr);
-
-        /*
-         * ThisTimeLineID is normally not set when we're still in recovery.
-         * However, recycling/preallocating segments above needed
-         * ThisTimeLineID to determine which timeline to install the segments
-         * on. Reset it now, to restore the normal state of affairs for
-         * debugging purposes.
-         */
-        if (RecoveryInProgress())
-            ThisTimeLineID = 0;
-    }
+    /*
+     * ThisTimeLineID is normally not set when we're still in recovery.
+     * However, recycling/preallocating segments above needed ThisTimeLineID
+     * to determine which timeline to install the segments on. Reset it now,
+     * to restore the normal state of affairs for debugging purposes.
+     */
+    if (RecoveryInProgress())
+        ThisTimeLineID = 0;
 
     /*
      * Truncate pg_subtrans if possible.  We can throw away all data before
-- 
2.16.3


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Darafei "Komяpa" Praliaskouski
Дата:
Сообщение: Re: JIT breaks PostGIS
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: [Proposal] Add accumulated statistics for wait event