Re: pg_rewind with cascade standby doesn't work well

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: pg_rewind with cascade standby doesn't work well
Дата
Msg-id ZRNqL8pxKnidU8wq@paquier.xyz
обсуждение исходный текст
Ответ на Re: pg_rewind with cascade standby doesn't work well  (Aleksander Alekseev <aleksander@timescale.com>)
Ответы Re: pg_rewind with cascade standby doesn't work well  (Kuwamura Masaki <kuwamura@db.is.i.nagoya-u.ac.jp>)
Список pgsql-hackers
On Tue, Sep 26, 2023 at 06:44:50PM +0300, Aleksander Alekseev wrote:
>> And also, I'm afraid that I'm not sure what kind of tests I have to make
>> for fix this behavior. Would you mind giving me some advice?
>
> Personally I would prefer not to increase the scope of work. Your TAP
> test added in 0001 seems to be adequate.

Yeah, agreed.  I'm OK with what you are proposing, basically (the
test could be made a bit cheaper actually).

         /*
-         * For Hot Standby, we could treat this like a Shutdown Checkpoint,
-         * but this case is rarer and harder to test, so the benefit doesn't
-         * outweigh the potential extra cost of maintenance.
+         * For Hot Standby, we could treat this like an end-of-recovery checkpoint
          */
+        RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT);

I don't understand what you want to change here.  Archive recovery and
crash recovery are two different things, still this code would be
triggered even if there is no standby.signal, aka the node is not a
standby.  Why isn't this stuff conditional?

>> BTW, I was able to
>> reproduce the assertion failure Kuwamura-san reported, even after applying
>> your latest patch from the thread.
>
> Do you mean that the test fails or it doesn't but there are other
> steps to reproduce the issue?

I get it as Fujii-san testing the patch from [1], still failing the
test from [2]:
[1]: https://www.postgresql.org/message-id/ZArVOMifjzE7f8W7@paquier.xyz
[2]: https://www.postgresql.org/message-id/CAMyC8qryE7mKyvPvGHCt5GpANAmp8sS_tRbraqXcPBx14viy6g@mail.gmail.com

I would be surprised, actually, because the patch from [1] would cause
step 7 of the test to fail: the patch causes standby.signal or
recovery.signal to be required.  Anyway, this specific issue, if any,
had better be discussed on the other thread.  I need to address a few
comments there as well and was planning to get back to it.  It is
possible that I've missed something on the other thread with the
restrictions I was proposing in the latest version of the patch.

For this thread, let's focus on the pg_rewind case and how we want to
treat these records to improve the cascading case.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Fail hard if xlogreader.c fails on out-of-memory
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: [PoC/RFC] Multiple passwords, interval expirations