Обсуждение: Crash by targetted recovery

Поиск
Список
Период
Сортировка

Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
Hello.

We found that targetted promotion can cause an assertion failure.  The
attached TAP test causes that.

> TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12078)

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous WAL segment, the assertion failure is triggered.

The wrong point is that StartupXLOG does random access fetching while
WaitForWALToBecomeAvailable is thinking it is still in streaming.  I
think if it is called with random access mode,
WaitForWALToBecomeAvailable should move to XLOG_FROM_ARCHIVE even
though it is thinking that it is still reading from stream.

regards.

-- Kyotaro Horiguchi
NTT Open Source Software Center
From 8cb817a43226a0d60dd62c6205219a2f0c807b9e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH 1/2] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 32 +++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..8e71788981 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
     'recovery end before target reached is a fatal error');
+
+############
+# Edge case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+                                has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
-- 
2.18.2

From 424729eafeeab8c96074f4c8d5b87d3a23cbf73a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 26 Feb 2020 20:41:37 +0900
Subject: [PATCH 2/2] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if random access is specified while streaming.
---
 src/backend/access/transam/xlog.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..41ed916342 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11831,7 +11831,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just
      *      pg_wal (XLOG_FROM_PG_WAL)
      * 2. Check trigger file
-     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM).
+     *    Random access mode rewinds the state machine to 1.
      * 4. Rescan timelines
      * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.
      *
@@ -11846,7 +11847,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      */
     if (!InArchiveRecovery)
         currentSource = XLOG_FROM_PG_WAL;
-    else if (currentSource == 0)
+    else if (currentSource == 0 || randAccess)
         currentSource = XLOG_FROM_ARCHIVE;
 
     for (;;)
-- 
2.18.2


Re: Crash by targetted recovery

От
Fujii Masao
Дата:

On 2020/02/27 12:48, Kyotaro Horiguchi wrote:
> Hello.
> 
> We found that targetted promotion can cause an assertion failure.  The
> attached TAP test causes that.
> 
>> TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12078)
> 
> After recovery target is reached, StartupXLOG turns off standby mode
> then refetches the last record. If the last record starts from the
> previous WAL segment, the assertion failure is triggered.

Good catch!

> The wrong point is that StartupXLOG does random access fetching while
> WaitForWALToBecomeAvailable is thinking it is still in streaming.  I
> think if it is called with random access mode,
> WaitForWALToBecomeAvailable should move to XLOG_FROM_ARCHIVE even
> though it is thinking that it is still reading from stream.

I failed to understand why random access while reading from
stream is bad idea. Could you elaborate why?

Isn't it sufficient to set currentSource to 0 when disabling
StandbyMode?

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
At Thu, 27 Feb 2020 14:40:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/02/27 12:48, Kyotaro Horiguchi wrote:
> > Hello.
> > We found that targetted promotion can cause an assertion failure.  The
> > attached TAP test causes that.
> > 
> >> TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12078)
> > After recovery target is reached, StartupXLOG turns off standby mode
> > then refetches the last record. If the last record starts from the
> > previous WAL segment, the assertion failure is triggered.
> 
> Good catch!
> 
> > The wrong point is that StartupXLOG does random access fetching while
> > WaitForWALToBecomeAvailable is thinking it is still in streaming.  I
> > think if it is called with random access mode,
> > WaitForWALToBecomeAvailable should move to XLOG_FROM_ARCHIVE even
> > though it is thinking that it is still reading from stream.
> 
> I failed to understand why random access while reading from
> stream is bad idea. Could you elaborate why?

It seems to me the word "streaming" suggests that WAL record should be
read sequentially. Random access, which means reading from arbitrary
location, breaks a stream.  (But the patch doesn't try to stop wal
sender if randAccess.)

> Isn't it sufficient to set currentSource to 0 when disabling
> StandbyMode?

I thought that and it should work, but I hesitated to manipulate on
currentSource in StartupXLOG. currentSource is basically a private
state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
think it's not good to modify it out of the the logic in
WaitForWALToBecomeAvailable.  Come to think of that I got to think the
following part in ReadRecord should use randAccess instead..

xlog.c:4384
>     /*
-      * Before we retry, reset lastSourceFailed and currentSource
-      * so that we will check the archive next.
+      * Streaming has broken, we retry from the same LSN.
>      */
>     lastSourceFailed = false;
-     currentSource = 0;
+     private->randAccess = true;

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Crash by targetted recovery

От
Fujii Masao
Дата:

On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
> At Thu, 27 Feb 2020 14:40:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>
>>
>> On 2020/02/27 12:48, Kyotaro Horiguchi wrote:
>>> Hello.
>>> We found that targetted promotion can cause an assertion failure.  The
>>> attached TAP test causes that.
>>>
>>>> TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12078)
>>> After recovery target is reached, StartupXLOG turns off standby mode
>>> then refetches the last record. If the last record starts from the
>>> previous WAL segment, the assertion failure is triggered.
>>
>> Good catch!
>>
>>> The wrong point is that StartupXLOG does random access fetching while
>>> WaitForWALToBecomeAvailable is thinking it is still in streaming.  I
>>> think if it is called with random access mode,
>>> WaitForWALToBecomeAvailable should move to XLOG_FROM_ARCHIVE even
>>> though it is thinking that it is still reading from stream.
>>
>> I failed to understand why random access while reading from
>> stream is bad idea. Could you elaborate why?
> 
> It seems to me the word "streaming" suggests that WAL record should be
> read sequentially. Random access, which means reading from arbitrary
> location, breaks a stream.  (But the patch doesn't try to stop wal
> sender if randAccess.)
> 
>> Isn't it sufficient to set currentSource to 0 when disabling
>> StandbyMode?
> 
> I thought that and it should work, but I hesitated to manipulate on
> currentSource in StartupXLOG. currentSource is basically a private
> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> think it's not good to modify it out of the the logic in
> WaitForWALToBecomeAvailable.

If so, what about adding the following at the top of
WaitForWALToBecomeAvailable()?

     if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
          currentSource = 0;

> Come to think of that I got to think the
> following part in ReadRecord should use randAccess instead..
> 
> xlog.c:4384
>>      /*
> -      * Before we retry, reset lastSourceFailed and currentSource
> -      * so that we will check the archive next.
> +      * Streaming has broken, we retry from the same LSN.
>>       */
>>      lastSourceFailed = false;
> -     currentSource = 0;
> +     private->randAccess = true;

Sorry, I failed to understand why this change is necessary...
At least the comment that you added seems incorrect
because WAL streaming should not have started yet when
we reach the above point.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
Thank you for the comment.
At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
> >> I failed to understand why random access while reading from
> >> stream is bad idea. Could you elaborate why?
> > It seems to me the word "streaming" suggests that WAL record should be
> > read sequentially. Random access, which means reading from arbitrary
> > location, breaks a stream.  (But the patch doesn't try to stop wal
> > sender if randAccess.)
> > 
> >> Isn't it sufficient to set currentSource to 0 when disabling
> >> StandbyMode?
> > I thought that and it should work, but I hesitated to manipulate on
> > currentSource in StartupXLOG. currentSource is basically a private
> > state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> > think it's not good to modify it out of the the logic in
> > WaitForWALToBecomeAvailable.
> 
> If so, what about adding the following at the top of
> WaitForWALToBecomeAvailable()?
> 
>     if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
>          currentSource = 0;

It works virtually the same way. I'm happy to do that if you don't
agree to using randAccess. But I'd rather do that in the 'if
(!InArchiveRecovery)' section.

> > Come to think of that I got to think the
> > following part in ReadRecord should use randAccess instead..
> > xlog.c:4384
> >>      /*
> > -      * Before we retry, reset lastSourceFailed and currentSource
> > -      * so that we will check the archive next.
> > +      * Streaming has broken, we retry from the same LSN.
> >>       */
> >>      lastSourceFailed = false;
> > -     currentSource = 0;
> > +     private->randAccess = true;
> 
> Sorry, I failed to understand why this change is necessary...

It's not necessary, just for being tidy about the responsibility on
currentSource.

> At least the comment that you added seems incorrect
> because WAL streaming should not have started yet when
> we reach the above point.

Oops, right.

-      * Streaming has broken, we retry from the same LSN.
+      * Restart recovery from the current LSN.

For clarity, I don't insist on the change at all. If it were
necessary, it's another topic, anyway. Please forget it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 8cb817a43226a0d60dd62c6205219a2f0c807b9e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v2 1/2] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 32 +++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..8e71788981 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
     'recovery end before target reached is a fatal error');
+
+############
+# Edge case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+                                has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
-- 
2.18.2

From d917638b787b9d1393cbc0d77586168da844756b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 26 Feb 2020 20:41:37 +0900
Subject: [PATCH v2 2/2] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if standby mode is turned off while
XLOG_FROM_STREAM.
---
 src/backend/access/transam/xlog.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..f82a582c53 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11831,7 +11831,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just
      *      pg_wal (XLOG_FROM_PG_WAL)
      * 2. Check trigger file
-     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM).
+     *    If StandbyMode is false, the state machine goes back to 1.
      * 4. Rescan timelines
      * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.
      *
@@ -11846,8 +11847,16 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      */
     if (!InArchiveRecovery)
         currentSource = XLOG_FROM_PG_WAL;
-    else if (currentSource == 0)
+    else if (currentSource == 0 ||
+             (!StandbyMode && currentSource == XLOG_FROM_STREAM))
+    {
+        /*
+         * Shoule be before starting streaming, or should have exited from
+         * streaming.
+         */
+        Assert(!WalRcvStreaming());
         currentSource = XLOG_FROM_ARCHIVE;
+    }
 
     for (;;)
     {
-- 
2.18.2


Re: Crash by targetted recovery

От
Fujii Masao
Дата:

On 2020/02/27 17:05, Kyotaro Horiguchi wrote:
> Thank you for the comment.
> At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
>>>> I failed to understand why random access while reading from
>>>> stream is bad idea. Could you elaborate why?
>>> It seems to me the word "streaming" suggests that WAL record should be
>>> read sequentially. Random access, which means reading from arbitrary
>>> location, breaks a stream.  (But the patch doesn't try to stop wal
>>> sender if randAccess.)
>>>
>>>> Isn't it sufficient to set currentSource to 0 when disabling
>>>> StandbyMode?
>>> I thought that and it should work, but I hesitated to manipulate on
>>> currentSource in StartupXLOG. currentSource is basically a private
>>> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
>>> think it's not good to modify it out of the the logic in
>>> WaitForWALToBecomeAvailable.
>>
>> If so, what about adding the following at the top of
>> WaitForWALToBecomeAvailable()?
>>
>>      if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
>>           currentSource = 0;
> 
> It works virtually the same way. I'm happy to do that if you don't
> agree to using randAccess. But I'd rather do that in the 'if
> (!InArchiveRecovery)' section.

The approach using randAccess seems unsafe. Please imagine
the case where currentSource is changed to XLOG_FROM_ARCHIVE
because randAccess is true, while walreceiver is still running.
For example, this case can occur when the record at REDO
starting point is fetched with randAccess = true after walreceiver
is invoked to fetch the last checkpoint record. The situation
"currentSource != XLOG_FROM_STREAM while walreceiver is
  running" seems invalid. No?

So I think that the approach that I proposed is better.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/02/27 17:05, Kyotaro Horiguchi wrote:
> > Thank you for the comment.
> > At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote in
> >> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
> >>>> I failed to understand why random access while reading from
> >>>> stream is bad idea. Could you elaborate why?
> >>> It seems to me the word "streaming" suggests that WAL record should be
> >>> read sequentially. Random access, which means reading from arbitrary
> >>> location, breaks a stream.  (But the patch doesn't try to stop wal
> >>> sender if randAccess.)
> >>>
> >>>> Isn't it sufficient to set currentSource to 0 when disabling
> >>>> StandbyMode?
> >>> I thought that and it should work, but I hesitated to manipulate on
> >>> currentSource in StartupXLOG. currentSource is basically a private
> >>> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> >>> think it's not good to modify it out of the the logic in
> >>> WaitForWALToBecomeAvailable.
> >>
> >> If so, what about adding the following at the top of
> >> WaitForWALToBecomeAvailable()?
> >>
> >>      if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
> >>           currentSource = 0;
> > It works virtually the same way. I'm happy to do that if you don't
> > agree to using randAccess. But I'd rather do that in the 'if
> > (!InArchiveRecovery)' section.
> 
> The approach using randAccess seems unsafe. Please imagine
> the case where currentSource is changed to XLOG_FROM_ARCHIVE
> because randAccess is true, while walreceiver is still running.
> For example, this case can occur when the record at REDO
> starting point is fetched with randAccess = true after walreceiver
> is invoked to fetch the last checkpoint record. The situation
> "currentSource != XLOG_FROM_STREAM while walreceiver is
>  running" seems invalid. No?

When I mentioned an possibility of changing ReadRecord so that it
modifies randAccess instead of currentSource, I thought that
WaitForWALToBecomeAvailable should shutdown wal receiver as
needed.

At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
me> location, breaks a stream.  (But the patch doesn't try to stop wal
me> sender if randAccess.)

And random access during StandbyMode ususally (always?) lets RecPtr go
back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
don't have a file in pg_wal and the REDO point is far back by more
than a segment from the initial checkpoint record.  (It seems to cause
assertion failure, but I haven't checked that.)

If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
re-connects to the primary for the past segment.

> So I think that the approach that I proposed is better.

It depends on how far we assume RecPtr go back.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Crash by targetted recovery

От
Fujii Masao
Дата:

On 2020/02/28 12:13, Kyotaro Horiguchi wrote:
> At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>
>>
>> On 2020/02/27 17:05, Kyotaro Horiguchi wrote:
>>> Thank you for the comment.
>>> At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao
>>> <masao.fujii@oss.nttdata.com> wrote in
>>>> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
>>>>>> I failed to understand why random access while reading from
>>>>>> stream is bad idea. Could you elaborate why?
>>>>> It seems to me the word "streaming" suggests that WAL record should be
>>>>> read sequentially. Random access, which means reading from arbitrary
>>>>> location, breaks a stream.  (But the patch doesn't try to stop wal
>>>>> sender if randAccess.)
>>>>>
>>>>>> Isn't it sufficient to set currentSource to 0 when disabling
>>>>>> StandbyMode?
>>>>> I thought that and it should work, but I hesitated to manipulate on
>>>>> currentSource in StartupXLOG. currentSource is basically a private
>>>>> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
>>>>> think it's not good to modify it out of the the logic in
>>>>> WaitForWALToBecomeAvailable.
>>>>
>>>> If so, what about adding the following at the top of
>>>> WaitForWALToBecomeAvailable()?
>>>>
>>>>       if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
>>>>            currentSource = 0;
>>> It works virtually the same way. I'm happy to do that if you don't
>>> agree to using randAccess. But I'd rather do that in the 'if
>>> (!InArchiveRecovery)' section.
>>
>> The approach using randAccess seems unsafe. Please imagine
>> the case where currentSource is changed to XLOG_FROM_ARCHIVE
>> because randAccess is true, while walreceiver is still running.
>> For example, this case can occur when the record at REDO
>> starting point is fetched with randAccess = true after walreceiver
>> is invoked to fetch the last checkpoint record. The situation
>> "currentSource != XLOG_FROM_STREAM while walreceiver is
>>   running" seems invalid. No?
> 
> When I mentioned an possibility of changing ReadRecord so that it
> modifies randAccess instead of currentSource, I thought that
> WaitForWALToBecomeAvailable should shutdown wal receiver as
> needed.
> 
> At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> me> location, breaks a stream.  (But the patch doesn't try to stop wal
> me> sender if randAccess.)

Sorry, I failed to notice that.

> And random access during StandbyMode ususally (always?) lets RecPtr go
> back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
> don't have a file in pg_wal and the REDO point is far back by more
> than a segment from the initial checkpoint record.

It works correctly. This is why WaitForWALToBecomeAvailable() uses
fetching_ckpt argument.

> If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
> re-connects to the primary for the past segment.

But this can lead to unnecessary restart of walreceiver. Since
fetching_ckpt ensures that the WAL file containing the REDO
starting record exists in pg_wal, there is no need to reconnect
to the primary when reading the REDO starting record.

Is there other case where we need to go back to XLOG_FROM_ARCHIVE
by random access?

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
At Mon, 2 Mar 2020 20:54:04 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > And random access during StandbyMode ususally (always?) lets RecPtr go
> > back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
> > don't have a file in pg_wal and the REDO point is far back by more
> > than a segment from the initial checkpoint record.
> 
> It works correctly. This is why WaitForWALToBecomeAvailable() uses
> fetching_ckpt argument.

Hmm. You're right. We start streaming from RedoStartLSN when
fetching_ckpt.  So that doesn't happen.

> > If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
> > re-connects to the primary for the past segment.
> 
> But this can lead to unnecessary restart of walreceiver. Since
> fetching_ckpt ensures that the WAL file containing the REDO
> starting record exists in pg_wal, there is no need to reconnect
> to the primary when reading the REDO starting record.
> 
> Is there other case where we need to go back to XLOG_FROM_ARCHIVE
> by random access?

I understand that the reconnection for REDO record is useless. Ok I
take the !StandbyMode way.

The attached is the test script that is changed to count the added
test, and the slight revised main patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 5165a55c11aff5f18a8d39029b9291070a908744 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v3 1/2] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 34 ++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..5b23ceb3c4 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
     'recovery end before target reached is a fatal error');
+
+# Corner case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+                                has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
+pass("targetted promotion on segment bounary");
-- 
2.18.2

From 246a7894684434dbfcba8c11545458f1633e76b5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Mar 2020 11:54:07 +0900
Subject: [PATCH v3 2/2] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if standby mode is turned off while
XLOG_FROM_STREAM.
---
 src/backend/access/transam/xlog.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..50ad5079b6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11831,7 +11831,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just
      *      pg_wal (XLOG_FROM_PG_WAL)
      * 2. Check trigger file
-     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM).
+     *    If StandbyMode is turned off, the state machine goes back to 1.
      * 4. Rescan timelines
      * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.
      *
@@ -11846,8 +11847,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      */
     if (!InArchiveRecovery)
         currentSource = XLOG_FROM_PG_WAL;
-    else if (currentSource == 0)
+    else if (currentSource == 0 ||
+             (!StandbyMode && currentSource == XLOG_FROM_STREAM))
+    {
+        /* Wal receiver should not active when entring XLOG_FROM_ARCHIVE */
+        Assert(!WalRcvStreaming());
         currentSource = XLOG_FROM_ARCHIVE;
+        lastSourceFailed = false; /* We haven't failed on the new source */
+    }
 
     for (;;)
     {
-- 
2.18.2


Re: Crash by targetted recovery

От
Fujii Masao
Дата:

On 2020/03/05 12:08, Kyotaro Horiguchi wrote:
> At Mon, 2 Mar 2020 20:54:04 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>> And random access during StandbyMode ususally (always?) lets RecPtr go
>>> back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
>>> don't have a file in pg_wal and the REDO point is far back by more
>>> than a segment from the initial checkpoint record.
>>
>> It works correctly. This is why WaitForWALToBecomeAvailable() uses
>> fetching_ckpt argument.
> 
> Hmm. You're right. We start streaming from RedoStartLSN when
> fetching_ckpt.  So that doesn't happen.
> 
>>> If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
>>> re-connects to the primary for the past segment.
>>
>> But this can lead to unnecessary restart of walreceiver. Since
>> fetching_ckpt ensures that the WAL file containing the REDO
>> starting record exists in pg_wal, there is no need to reconnect
>> to the primary when reading the REDO starting record.
>>
>> Is there other case where we need to go back to XLOG_FROM_ARCHIVE
>> by random access?
> 
> I understand that the reconnection for REDO record is useless. Ok I
> take the !StandbyMode way.
> 
> The attached is the test script that is changed to count the added
> test, and the slight revised main patch.

Thanks for the patch!

+        /* Wal receiver should not active when entring XLOG_FROM_ARCHIVE */
+        Assert(!WalRcvStreaming());

+1 to add this assertion check.

Isn't it better to always check this while trying to read WAL from
archive or pg_wal? So, what about the following change?

                 {
                         case XLOG_FROM_ARCHIVE:
                         case XLOG_FROM_PG_WAL:
+                               /*
+                                * WAL receiver should not be running while trying to
+                                * read WAL from archive or pg_wal.
+                                */
+                               Assert(!WalRcvStreaming());
+
                                 /* Close any old file we might have open. */
                                 if (readFile >= 0)


+        lastSourceFailed = false; /* We haven't failed on the new source */

Is this really necessary? Since ReadRecord() always reset
lastSourceFailed to false, it seems not necessary.


-    else if (currentSource == 0)
+    else if (currentSource == 0 ||

Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
There are some places where 0 is used as the value of currentSource.
IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
At Thu, 5 Mar 2020 19:51:11 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/03/05 12:08, Kyotaro Horiguchi wrote:
> > I understand that the reconnection for REDO record is useless. Ok I
> > take the !StandbyMode way.
> > The attached is the test script that is changed to count the added
> > test, and the slight revised main patch.
> 
> Thanks for the patch!
> 
> + /* Wal receiver should not active when entring XLOG_FROM_ARCHIVE */
> +        Assert(!WalRcvStreaming());
> 
> +1 to add this assertion check.
> 
> Isn't it better to always check this while trying to read WAL from
> archive or pg_wal? So, what about the following change?
> 
>                 {
>                         case XLOG_FROM_ARCHIVE:
>                         case XLOG_FROM_PG_WAL:
> +                               /*
> + * WAL receiver should not be running while trying to
> +                                * read WAL from archive or pg_wal.
> +                                */
> +                               Assert(!WalRcvStreaming());
> +
>                                 /* Close any old file we might have open. */
>                                 if (readFile >= 0)

(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.

> + lastSourceFailed = false; /* We haven't failed on the new source */
> 
> Is this really necessary? Since ReadRecord() always reset
> lastSourceFailed to false, it seems not necessary.

It's just to make sure.  Actually lastSourceFailed is always false
when we get there.  But when the source is switched, lastSourceFailed
should be changed to false as a matter of design.  I'd like to do that
unless that harms.

> -    else if (currentSource == 0)
> +    else if (currentSource == 0 ||
> 
> Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
> There are some places where 0 is used as the value of currentSource.
> IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.

Yeah, I've thought that many times but have neglected since it is not
critical and trivial as a separate patch.  I'd take the chance to do
that now.  Another minor glitch is "int oldSource = currentSource;" it
is not debugger-friendly so I changed it to XLogSource.  It is added
as a new patch file before the main patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 839edcba7e47156e77d5dc9ec7dfb9babc9ba846 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v4 1/3] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 34 ++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..5b23ceb3c4 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
     'recovery end before target reached is a fatal error');
+
+# Corner case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+                                has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
+pass("targetted promotion on segment bounary");
-- 
2.18.2

From 82274c41cb529d87270d4e2a78dbf06cd139b8f6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 6 Mar 2020 10:11:59 +0900
Subject: [PATCH v4 2/3] Tidy up XLogSource usage.

We used interger 0 instead of XLOG_FROM_ANY and defined a variable to
store the type with int. Tidy them up for readability and
debugger-friendliness.
---
 src/backend/access/transam/xlog.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4361568882..db054c8d32 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -804,7 +804,7 @@ static XLogSource readSource = 0;    /* XLOG_FROM_* code */
  * attempt to read from currentSource failed, and we should try another source
  * next.
  */
-static XLogSource currentSource = 0;    /* XLOG_FROM_* code */
+static XLogSource currentSource = XLOG_FROM_ANY;    /* XLOG_FROM_* code */
 static bool lastSourceFailed = false;
 
 typedef struct XLogPageReadPrivate
@@ -4387,7 +4387,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
                  * so that we will check the archive next.
                  */
                 lastSourceFailed = false;
-                currentSource = 0;
+                currentSource = XLOG_FROM_ANY;
 
                 continue;
             }
@@ -11864,7 +11864,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 
     for (;;)
     {
-        int            oldSource = currentSource;
+        XLogSource    oldSource = currentSource;
 
         /*
          * First check if we failed to read from the current source, and
-- 
2.18.2

From 3a0f31b7c5a9d7f53134a4c909e69814b24bf526 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Mar 2020 11:54:07 +0900
Subject: [PATCH v4 3/3] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if standby mode is turned off while
XLOG_FROM_STREAM.
---
 src/backend/access/transam/xlog.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index db054c8d32..88abd53cfb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11844,7 +11844,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just
      *      pg_wal (XLOG_FROM_PG_WAL)
      * 2. Check trigger file
-     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM).
+     *    If StandbyMode is turned off, the state machine goes back to 1.
      * 4. Rescan timelines
      * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.
      *
@@ -11859,8 +11860,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      */
     if (!InArchiveRecovery)
         currentSource = XLOG_FROM_PG_WAL;
-    else if (currentSource == 0)
+    else if (currentSource == 0 ||
+             (!StandbyMode && currentSource == XLOG_FROM_STREAM))
+    {
+        lastSourceFailed = false; /* We haven't failed on the new source */
         currentSource = XLOG_FROM_ARCHIVE;
+    }
 
     for (;;)
     {
@@ -12054,6 +12059,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
         {
             case XLOG_FROM_ARCHIVE:
             case XLOG_FROM_PG_WAL:
+                /* Wal receiver shouln't be active here */
+                Assert(!WalRcvStreaming());
+
                 /* Close any old file we might have open. */
                 if (readFile >= 0)
                 {
-- 
2.18.2


Re: Crash by targetted recovery

От
Fujii Masao
Дата:

On 2020/03/06 10:29, Kyotaro Horiguchi wrote:
> At Thu, 5 Mar 2020 19:51:11 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>
>>
>> On 2020/03/05 12:08, Kyotaro Horiguchi wrote:
>>> I understand that the reconnection for REDO record is useless. Ok I
>>> take the !StandbyMode way.
>>> The attached is the test script that is changed to count the added
>>> test, and the slight revised main patch.
>>
>> Thanks for the patch!
>>
>> + /* Wal receiver should not active when entring XLOG_FROM_ARCHIVE */
>> +        Assert(!WalRcvStreaming());
>>
>> +1 to add this assertion check.
>>
>> Isn't it better to always check this while trying to read WAL from
>> archive or pg_wal? So, what about the following change?
>>
>>                  {
>>                          case XLOG_FROM_ARCHIVE:
>>                          case XLOG_FROM_PG_WAL:
>> +                               /*
>> + * WAL receiver should not be running while trying to
>> +                                * read WAL from archive or pg_wal.
>> +                                */
>> +                               Assert(!WalRcvStreaming());
>> +
>>                                  /* Close any old file we might have open. */
>>                                  if (readFile >= 0)
> 
> (It seems retroverting to the first patch when I started this...)
> The second place covers wider cases so I reverted the first place.

Thanks for updating the patch that way.
Not sure which patch you're mentioning, though.

Regarding 0003 patch, I added a bit more detail comments into
the patch so that we can understand the code more easily.
Updated version of 0003 patch attached. Barring any objection,
at first, I plan to commit this patch.

>> + lastSourceFailed = false; /* We haven't failed on the new source */
>>
>> Is this really necessary? Since ReadRecord() always reset
>> lastSourceFailed to false, it seems not necessary.
> 
> It's just to make sure.  Actually lastSourceFailed is always false
> when we get there.  But when the source is switched, lastSourceFailed
> should be changed to false as a matter of design.  I'd like to do that
> unless that harms.

OK.
  
>> -    else if (currentSource == 0)
>> +    else if (currentSource == 0 ||
>>
>> Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
>> There are some places where 0 is used as the value of currentSource.
>> IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.
> 
> Yeah, I've thought that many times but have neglected since it is not
> critical and trivial as a separate patch.  I'd take the chance to do
> that now.  Another minor glitch is "int oldSource = currentSource;" it
> is not debugger-friendly so I changed it to XLogSource.  It is added
> as a new patch file before the main patch.

There seems to be more other places where XLogSource and
XLOG_FROM_XXX are not used yet. For example, the initial values
of readSource and XLogReceiptSource, the type of argument
"source" in XLogFileReadAnyTLI() and XLogFileRead(), etc.
These also should be updated?

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > (It seems retroverting to the first patch when I started this...)
> > The second place covers wider cases so I reverted the first place.
> 
> Thanks for updating the patch that way.
> Not sure which patch you're mentioning, though.

That meant 0003.

> Regarding 0003 patch, I added a bit more detail comments into
> the patch so that we can understand the code more easily.
> Updated version of 0003 patch attached. Barring any objection,
> at first, I plan to commit this patch.

Looks good to me. Thanks for writing the detailed comments.

> >> + lastSourceFailed = false; /* We haven't failed on the new source */
> >>
> >> Is this really necessary? Since ReadRecord() always reset
> >> lastSourceFailed to false, it seems not necessary.
> > It's just to make sure.  Actually lastSourceFailed is always false
> > when we get there.  But when the source is switched, lastSourceFailed
> > should be changed to false as a matter of design.  I'd like to do that
> > unless that harms.
> 
> OK.

Thanks.

> >> -    else if (currentSource == 0)
> >> +    else if (currentSource == 0 ||
> >>
> >> Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
> >> There are some places where 0 is used as the value of currentSource.
> >> IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.
> > Yeah, I've thought that many times but have neglected since it is not
> > critical and trivial as a separate patch.  I'd take the chance to do
> > that now.  Another minor glitch is "int oldSource = currentSource;" it
> > is not debugger-friendly so I changed it to XLogSource.  It is added
> > as a new patch file before the main patch.
> 
> There seems to be more other places where XLogSource and
> XLOG_FROM_XXX are not used yet. For example, the initial values
> of readSource and XLogReceiptSource, the type of argument
> "source" in XLogFileReadAnyTLI() and XLogFileRead(), etc.
> These also should be updated?

Right. I checked through the file and AFAICS that's all.  The attachec
v5-0001-Tidy...patch is the fix on top of the v4-0003 on the current
master.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 358f37899a02a8ae6f803fe32b97c2cbe302786f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 6 Mar 2020 10:11:59 +0900
Subject: [PATCH v5] Tidy up XLogSource usage.

We used interger 0 instead of XLOG_FROM_ANY and defined a variable to
store the type with int. Tidy them up for readability and
debugger-friendliness.
---
 src/backend/access/transam/xlog.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 21c0acb740..9ebfbf31c5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -795,7 +795,7 @@ static int    readFile = -1;
 static XLogSegNo readSegNo = 0;
 static uint32 readOff = 0;
 static uint32 readLen = 0;
-static XLogSource readSource = 0;    /* XLOG_FROM_* code */
+static XLogSource readSource = XLOG_FROM_ANY;
 
 /*
  * Keeps track of which source we're currently reading from. This is
@@ -804,7 +804,7 @@ static XLogSource readSource = 0;    /* XLOG_FROM_* code */
  * attempt to read from currentSource failed, and we should try another source
  * next.
  */
-static XLogSource currentSource = 0;    /* XLOG_FROM_* code */
+static XLogSource currentSource = XLOG_FROM_ANY;
 static bool lastSourceFailed = false;
 
 typedef struct XLogPageReadPrivate
@@ -823,7 +823,7 @@ typedef struct XLogPageReadPrivate
  * XLogReceiptSource tracks where we last successfully read some WAL.)
  */
 static TimestampTz XLogReceiptTime = 0;
-static XLogSource XLogReceiptSource = 0;    /* XLOG_FROM_* code */
+static XLogSource XLogReceiptSource = XLOG_FROM_ANY;
 
 /* State information for XLOG reading */
 static XLogRecPtr ReadRecPtr;    /* start of last record read */
@@ -886,8 +886,8 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                                    bool find_free, XLogSegNo max_segno,
                                    bool use_lock);
 static int    XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
-                         int source, bool notfoundOk);
-static int    XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
+                         XLogSource source, bool notfoundOk);
+static int    XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source);
 static int    XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
                          int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
@@ -3633,7 +3633,7 @@ XLogFileOpen(XLogSegNo segno)
  */
 static int
 XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
-             int source, bool notfoundOk)
+             XLogSource source, bool notfoundOk)
 {
     char        xlogfname[MAXFNAMELEN];
     char        activitymsg[MAXFNAMELEN + 16];
@@ -3715,7 +3715,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
  * This version searches for the segment with any TLI listed in expectedTLEs.
  */
 static int
-XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source)
+XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 {
     char        path[MAXPGPATH];
     ListCell   *cell;
@@ -4387,7 +4387,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
                  * so that we will check the archive next.
                  */
                 lastSourceFailed = false;
-                currentSource = 0;
+                currentSource = XLOG_FROM_ANY;
 
                 continue;
             }
@@ -11669,7 +11669,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
 
         close(readFile);
         readFile = -1;
-        readSource = 0;
+        readSource = XLOG_FROM_ANY;
     }
 
     XLByteToSeg(targetPagePtr, readSegNo, wal_segment_size);
@@ -11689,7 +11689,7 @@ retry:
                 close(readFile);
             readFile = -1;
             readLen = 0;
-            readSource = 0;
+            readSource = XLOG_FROM_ANY;
 
             return -1;
         }
@@ -11795,7 +11795,7 @@ next_record_is_invalid:
         close(readFile);
     readFile = -1;
     readLen = 0;
-    readSource = 0;
+    readSource = XLOG_FROM_ANY;
 
     /* In standby-mode, keep trying */
     if (StandbyMode)
@@ -11860,7 +11860,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      */
     if (!InArchiveRecovery)
         currentSource = XLOG_FROM_PG_WAL;
-    else if (currentSource == 0 ||
+    else if (currentSource == XLOG_FROM_ANY ||
              (!StandbyMode && currentSource == XLOG_FROM_STREAM))
     {
         lastSourceFailed = false; /* We haven't failed on the new source */
@@ -11869,7 +11869,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 
     for (;;)
     {
-        int            oldSource = currentSource;
+        XLogSource    oldSource = currentSource;
 
         /*
          * First check if we failed to read from the current source, and
-- 
2.18.2


Re: Crash by targetted recovery

От
Fujii Masao
Дата:

On 2020/03/09 13:49, Kyotaro Horiguchi wrote:
> At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>> (It seems retroverting to the first patch when I started this...)
>>> The second place covers wider cases so I reverted the first place.
>>
>> Thanks for updating the patch that way.
>> Not sure which patch you're mentioning, though.
> 
> That meant 0003.
> 
>> Regarding 0003 patch, I added a bit more detail comments into
>> the patch so that we can understand the code more easily.
>> Updated version of 0003 patch attached. Barring any objection,
>> at first, I plan to commit this patch.
> 
> Looks good to me. Thanks for writing the detailed comments.

Thanks for the review! Pushed.

I will review other two patches later.

>> There seems to be more other places where XLogSource and
>> XLOG_FROM_XXX are not used yet. For example, the initial values
>> of readSource and XLogReceiptSource, the type of argument
>> "source" in XLogFileReadAnyTLI() and XLogFileRead(), etc.
>> These also should be updated?
> 
> Right. I checked through the file and AFAICS that's all.  The attachec
> v5-0001-Tidy...patch is the fix on top of the v4-0003 on the current
> master.

Thanks for the patch!

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
At Mon, 9 Mar 2020 15:46:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> On 2020/03/09 13:49, Kyotaro Horiguchi wrote:
> > At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao
> >> Regarding 0003 patch, I added a bit more detail comments into
> >> the patch so that we can understand the code more easily.
> >> Updated version of 0003 patch attached. Barring any objection,
> >> at first, I plan to commit this patch.
> > Looks good to me. Thanks for writing the detailed comments.
> 
> Thanks for the review! Pushed.

Thanks for commiting.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Crash by targetted recovery

От
Fujii Masao
Дата:

On 2020/03/09 15:46, Fujii Masao wrote:
> 
> 
> On 2020/03/09 13:49, Kyotaro Horiguchi wrote:
>> At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>>> (It seems retroverting to the first patch when I started this...)
>>>> The second place covers wider cases so I reverted the first place.
>>>
>>> Thanks for updating the patch that way.
>>> Not sure which patch you're mentioning, though.
>>
>> That meant 0003.
>>
>>> Regarding 0003 patch, I added a bit more detail comments into
>>> the patch so that we can understand the code more easily.
>>> Updated version of 0003 patch attached. Barring any objection,
>>> at first, I plan to commit this patch.
>>
>> Looks good to me. Thanks for writing the detailed comments.
> 
> Thanks for the review! Pushed.
> 
> I will review other two patches later.

Pushed the v5-0001-Tidy-up-XLogSource-usage.patch!

Regarding the remaining patch adding the regression test,

+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);

What happens if "off" part gets the upper limit and "seg" part needs
to be incremented? What happens if pg_last_wal_replay_lsn() advances
very much (e.g., because of autovacuum) beyond the segment boundary
until the standby restarts? Of course, these situations very rarely happen,
but I'd like to avoid adding such not stable test if possible.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
At Tue, 10 Mar 2020 10:50:52 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> Pushed the v5-0001-Tidy-up-XLogSource-usage.patch!

Thanks!

> Regarding the remaining patch adding the regression test,

I didn't seriously inteneded it to be in the tree.

> +$result =
> + $node_standby->safe_psql('postgres', "SELECT
> pg_last_wal_replay_lsn()");
> +my ($seg, $off) = split('/', $result);
> +my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) *
> $segsize);
> 
> What happens if "off" part gets the upper limit and "seg" part needs
> to be incremented? What happens if pg_last_wal_replay_lsn() advances
> very much (e.g., because of autovacuum) beyond the segment boundary
> until the standby restarts? Of course, these situations very rarely
> happen,
> but I'd like to avoid adding such not stable test if possible.

In the first place the "seg" is "fileno". Honestly I don't think the
test doesn't reach to fileno boundary but I did in the attached. Since
perl complains over-32bit integer arithmetic as incomptible, the
calculation gets a bit odd shape to avoid over-32bit arithmetic.

For the second point, which seems more likely to happen, I added the
VACUUM/pg_switch_wal() sequence then wait standby for catch up, before
doing the test.

Does it make sense?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 89621911e09df73f0f3a63501bb41ffb37edef05 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 10 Mar 2020 14:43:29 +0900
Subject: [PATCH v5] TAP test for a crash bug

Add a test for targetted promotion happend at segment boundary.
---
 src/test/recovery/t/003_recovery_targets.pl | 50 ++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..6b319bc891 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -167,3 +167,51 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
     'recovery end before target reached is a fatal error');
+
+# Corner case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+                                has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## make sure replication stays at the beginning of a segment
+$node_master->safe_psql('postgres', "VACUUM; SELECT pg_switch_wal();");
+my $catch_up_lsn =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
+my $caughtup_query =
+  "SELECT '$catch_up_lsn'::pg_lsn <= pg_last_wal_replay_lsn()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to catch up";
+## calculate the next segment boundary
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($fileno, $off) = split('/', $result);
+my ($newoff, $newfileno) = (hex($off), hex($fileno));
+my $newsegnum = int($newoff / $segsize) + 1;
+## treat segment-overflow, avoiding over-32bit arithmetic, segsize is
+## a power of 2 larger than 1MB.
+if ($newsegnum * ($segsize >> 1) >= 0x80000000)
+{
+    $newfileno += ($newsegnum * ($segsize >> 1)) >> 31;
+    $newsegnum %= (0x80000000 / ($segsize >> 1));
+}
+my $target = sprintf("%X/%08X", $newfileno, $newsegnum * $segsize);
+## the standby stops just after the next segment boundary
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+$caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
+pass("targetted promotion on segment bounary");
-- 
2.18.2


Re: Crash by targetted recovery

От
Kyotaro Horiguchi
Дата:
(Hmm. My sight must be as short as 2 word length..)

At Tue, 10 Mar 2020 14:59:00 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> In the first place the "seg" is "fileno". Honestly I don't think the
> test doesn't reach to fileno boundary but I did in the attached. Since

Of course it is a mistake of "Honestly I don't think the test reaches
to fileno boudary".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center