Re: [BUG] non archived WAL removed during production crash recovery

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: [BUG] non archived WAL removed during production crash recovery
Дата
Msg-id 20200410.110031.2261790267570514549.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: [BUG] non archived WAL removed during production crash recovery  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Ответы Re: [BUG] non archived WAL removed during production crash recovery  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Re: [BUG] non archived WAL removed during production crash recovery  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Список pgsql-bugs
By the way, I haven't noticed that Cc: didn't contain -hackers.  Added
it.


At Thu, 9 Apr 2020 11:35:12 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in 
> On Thu, 09 Apr 2020 11:26:57 +0900 (JST)
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> [...]
> > > > At Thu, 2 Apr 2020 15:49:15 +0200, Jehan-Guillaume de Rorthais
> > > > <jgdr@dalibo.com> wrote in   
> > > > > > Ok, so our *current* consensus seems the followings. Right?
> > > > > > 
> > > > > > - If archive_mode=off, any WAL files with .ready files are removed in
> > > > > >     crash recovery, archive recoery and standby mode.    
> > > > > 
> > > > > yes    
> > > > 
> > > > If archive_mode = off no WAL files are marked as ".ready".  
> > > 
> > > Sure, on the primary side.
> > > 
> > > What if you build a standby from a backup with archive_mode=on with
> > > some .ready files in there?   
> > 
> > Well. Backup doesn't have nothing in archive_status directory if it is
> > taken by pg_basebackup. If the backup is created other way, it can
> > have some (as Fujii-san mentioned).  Master with archive_mode != off
> > and standby with archive_mode=always should archive WAL files that are
> > not marked .done, but standby with archive_mode == on should not. The
> > commit intended that
> 
> Unless I'm wrong, the commit avoids creating .ready files on standby when a WAL
> has neither .done or .ready status file.

Right.

> > but the mistake here is it thinks that inRecovery represents whether it is
> > running as a standby or not, but actually it is true on primary during crash
> > recovery.
> 
> Indeed.
> 
> > On the other hand, with the patch, standby with archive_mode=on
> > wrongly archives WAL files during crash recovery.
> 
> "without the patch" you mean? You are talking about 78ea8b5daab, right?

No. I menat the v4 patch in [1].

[1] https://www.postgresql.org/message-id/20200407171736.61906608%40firost

Prior to appllying the patch (that is the commit 78ea..),
XLogArchiveCheckDone() correctly returns true (= allow to remove it)
for the same condition.

The proposed patch does the folloing thing.

if (!XLogArchivingActive() ||
    recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways())
    return true;

It doesn't return for the condition "recoveryState=CRASH_RECOVERING
and archive_mode = on". Then the WAL files is mitakenly marked as
".ready" if not marked yet.

By the way, the code seems not following the convention a bit
here. Let the inserting code be in the same style to the existing code
around.

+    if ( ! XLogArchivingActive() )

I think we don't put the  spaces within the parentheses above.

| ARCHIVE_RECOVERING/CRASH_RECOVERING/NOT_IN_RECOVERY

The first two and the last one are in different style. *I* prefer them
(if we use it) be "IN_ARCHIVE_RECOVERY/IN_CRASH_RECOVERY/NOT_IN_RECOVERY".

> > What we should check there is, as the commit was intended, not whether
> > it is under crash or archive recovery, but whether it is running as
> > primary or standby.
> 
> Yes.
> 
> > > > If it is "always", WAL files that are to be archived are
> > > > marked as ".ready".  Finally, the condition reduces to:
> > > > 
> > > > If archiver is running, archive ".ready" files. Otherwise ignore
> > > > ".ready" and just remove WAL files after use.  
> > > > >     
> > > > > > That is, WAL files with .ready files are removed when either
> > > > > > archive_mode!=always in standby mode or archive_mode=off.    
> > > > > 
> > > > > sounds fine to me.    
> > > > 
> > > > That situation implies that archive_mode has been changed.  
> > > 
> > > Why? archive_mode may have been "always" on the primary when eg. a snapshot
> > > has been created.  
> > 
> > .ready files are created only when archive_mode != off.
> 
> Yes, on a primary, they are created when archive_mode > off. On standby, when
> archive_mode=always. If a primary had archive_mode=always, a standby created
> from its backup will still have archive_mode=always, with no changes.
> Maybe I miss your point, sorry.

Sorry, it was ambiguous.

> That is, WAL files with .ready files are removed when either
> archive_mode!=always in standby mode or archive_mode=off.    

If we have .ready file when archive_mode = off, the cluster (or the
original of the copy cluster) should have been running in archive = on
or always. That is, archive_mode has been changed. But anyway that
discussion would not be in much relevance.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #16354: No geos 3.8.1 package for RHEL 8
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: [BUG] non archived WAL removed during production crash recovery