Re: Loss of replication after simple misconfiguration

Поиск
Список
Период
Сортировка
От Andrew Gierth
Тема Re: Loss of replication after simple misconfiguration
Дата
Msg-id 878sj4skmj.fsf@news-spur.riddles.org.uk
обсуждение исходный текст
Ответ на Loss of replication after simple misconfiguration  (hubert depesz lubaczewski <depesz@depesz.com>)
Ответы Re: Loss of replication after simple misconfiguration  (Victor Yegorov <vyegorov@gmail.com>)
Список pgsql-bugs
>>>>> "hubert" == hubert depesz lubaczewski <depesz@depesz.com> writes:

 hubert> PostgreSQL 9.5.15 on Ubuntu bionic.
 [...]
 hubert> tried to restart only to be greeted by:
 hubert> 2020-04-07T15:13:49.729943+00:00 postgres[20491]: [7-1] db=,user= LOG:  restored log file
"000000030001779200000061"from archive
 
 hubert> 2020-04-07T15:13:49.757222+00:00 postgres[20491]: [8-1] db=,user= FATAL:  could not access status of
transaction4275781146
 
 hubert> 2020-04-07T15:13:49.757314+00:00 postgres[20491]: [8-2] db=,user= DETAIL:  Could not read from file
"pg_commit_ts/27D4B"at offset 245760: Success.
 
 hubert> 2020-04-07T15:13:49.757380+00:00 postgres[20491]: [8-3] db=,user= CONTEXT:  xlog redo Transaction/COMMIT:
2020-04-0702:40:10.065859+00
 
 hubert> 2020-04-07T15:13:49.761239+00:00 postgres[20487]: [2-1] db=,user= LOG:  startup process (PID 20491) exited
withexit code 1
 
 hubert> 2020-04-07T15:13:49.761387+00:00 postgres[20487]: [3-1] db=,user= LOG:  terminating any other active server
processes

So I've been assisting hubert with analysis of this on IRC, and what we
have found so far suggests:

1. the max_worker_processes thing is a red herring

2. It is virtually certain that the restart, in addition to changing
max_worker_processes on the master, also changed the master's setting of
track_commit_timestamp from off to on (which is clearly relevant to the
issue)

(We established #2 from the fact that we _do_ have the WAL files from
the failed recovery, and they don't contain any COMMIT_TS_ZEROPAGE
records despite covering many thousands of transactions.)

I've suggested trying to reproduce the issue by changing this parameter
across a crash.

I did notice that 9.5.15 does have a fix for an issue in this area, but
I didn't see any more recent changes - did I miss anything?

-- 
Andrew (irc:RhodiumToad)



В списке pgsql-bugs по дате отправления:

Предыдущее
От: "Daniel Verite"
Дата:
Сообщение: Re: BUG #16351: PostgreSQL closing connection during requests with segmentation fault
Следующее
От: Jehan-Guillaume de Rorthais
Дата:
Сообщение: Re: [BUG] non archived WAL removed during production crash recovery