Re: Duplicate history file?

Поиск
Список
Период
Сортировка
От Tatsuro Yamada
Тема Re: Duplicate history file?
Дата
Msg-id 9bd1cc76-5fb8-6954-dce2-ab8ca56642ef@nttcom.co.jp_1
обсуждение исходный текст
Ответ на Duplicate history file?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Duplicate history file?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
Hi Horiguchi-san,

On 2021/05/31 16:58, Kyotaro Horiguchi wrote:
> So, I started a thread for this topic diverged from the following
> thread.
> 
> https://www.postgresql.org/message-id/4698027d-5c0d-098f-9a8e-8cf09e36a555@nttcom.co.jp_1
> 
>> So, what should we do for the user? I think we should put some notes
>> in postgresql.conf or in the documentation. For example, something
>> like this:
> 
> I'm not sure about the exact configuration you have in mind, but that
> would happen on the cascaded standby in the case where the upstream
> promotes. In this case, the history file for the new timeline is
> archived twice.  walreceiver triggers archiving of the new history
> file at the time of the promotion, then startup does the same when it
> restores the file from archive.  Is it what you complained about?


Thank you for creating a new thread and explaining this.
We are not using cascade replication in our environment, but I think
the situation is similar. As an overview, when I do a promote,
the archive_command fails due to the history file.

I've created a reproduction script that includes building replication,
and I'll share it with you. (I used Robert's test.sh as a reference
for creating the reproduction script. Thanks)

The scenario (sr_test_historyfile.sh) is as follows.

#1 Start pgprimary as a main
#2 Create standby
#3 Start pgstandby as a standby
#4 Execute archive command
#5 Shutdown pgprimary
#6 Start pgprimary as a standby
#7 Promote pgprimary
#8 Execute archive_command again, but failed since duplicate history
    file exists (see pgstandby.log)

Note that this may not be appropriate if you consider it as a recovery
procedure for replication configuration. However, I'm sharing it as it is
because this seems to be the procedure used in the customer's environment (PG-REX).

  
> The same workaround using the alternative archive script works for the
> case.
> 
> We could check pg_wal before fetching archive, however, archiving is
> not controlled so strictly that duplicate archiving never happens and
> I think we choose possible duplicate archiving than having holes in
> archive. (so we suggest the "test ! -f" script)
> 
>> ====
>> Note: If you use archive_mode=always, the archive_command on the
>> standby side should not be used "test ! -f".
>> ====
> 
> It could be one workaround. However, I would suggest not to overwrite
> existing files (with a file with different content) to protect archive
> from corruption.
> 
> We might need to write that in the documentation...

I think you're right, replacing it with an alternative archive script
that includes the cmp command will resolve the error. The reason is that
I checked with the diff command that the history files are identical.

=====
$ diff -s pgprimary/arc/00000002.history  pgstandby/arc/00000002.history
Files pgprimary/arc/00000002.history and pgstandby/arc/00000002.history are identical
=====

Regarding "test ! -f",
I am wondering how many people are using the test command for
archive_command. If I remember correctly, the guide provided by
NTT OSS Center that we are using does not recommend using the test command.


Regards,
Tatsuro Yamada


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Decoding speculative insert with toast leaks memory