Re: Replication failure, slave requesting old segments

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Replication failure, slave requesting old segments
Дата
Msg-id fbfd3891-06f2-0b3d-e346-fd61503e4a2e@aklaver.com
обсуждение исходный текст
Ответ на Re: Replication failure, slave requesting old segments  ("Phil Endecott" <spam_from_pgsql_lists@chezphil.org>)
Список pgsql-general
On 08/12/2018 12:25 PM, Phil Endecott wrote:
> Hi Adrian,
> 
> Adrian Klaver wrote:
>> On 08/11/2018 12:42 PM, Phil Endecott wrote:
>>> Hi Adrian,
>>>
>>> Adrian Klaver wrote:
>>>> Looks like the master recycled the WAL's while the slave could not 
>>>> connect.
>>>
>>> Yes but... why is that a problem?  The master is copying the WALs to
>>> the backup server using scp, where they remain forever.  The slave gets
>>
>> To me it looks like that did not happen:
>>
>> 2018-08-11 00:05:50.364 UTC [615] LOG:  restored log file 
>> "0000000100000007000000D0" from archive
>> scp: backup/postgresql/archivedir/0000000100000007000000D1: No such 
>> file or directory
>> 2018-08-11 00:05:51.325 UTC [7208] LOG:  started streaming WAL from 
>> primary at 7/D0000000 on timeline 1
>> 2018-08-11 00:05:51.325 UTC [7208] FATAL:  could not receive data from 
>> WAL stream: ERROR:  requested WAL segment 0000000100000007000000D0 has 
>> already been removed
>>
>> Above 0000000100000007000000D0 is gone/recycled on the master and the 
>> archived version does not seem to be complete as the streaming 
>> replication is trying to find it.
> 
> The files on the backup server were all 16 MB.

WAL files are created/recycled as 16 MB files, which is not the same as 
saying they are complete for the purposes of restoring. In other words 
you could be looking at a 16 MB file full of 0's.

> 
> 
>> Below you kick the master and it coughs up the files to the archive 
>> including *D0 and *D1 on up to *D4 and then the streaming picks using 
>> *D5.
> 
> When I kicked it, the master wrote D1 to D4 to the backup.  It did not
> change D0 (its modification time on the backup is from before the "kick").
> The slave re-read D0, again, as it had been doing throughout this period,
> and then read D1 to D4.

Well something happened because the slave could not get all the 
information it needed from the D0 in the archive and was trying to get 
it from the masters pg_xlog.

> 
> 
>> Best guess is the archiving did not work as expected during:
>>
>> "(During this time the master was also down for a shorter period.)"
> 
> Around the time the master was down, the WAL segment names were CB and CC.
> Files CD to CF were written between the master coming up and the slave
> coming up.  The slave had no trouble restoring those segments when it 
> started.
> The problematic segments D0 and D1 were the ones that were "current" 
> when the
> slave restarted, at which time the master was up consistently.
> 
> 
> Regards, Phil.
> 
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


В списке pgsql-general по дате отправления:

Предыдущее
От: "Phil Endecott"
Дата:
Сообщение: Re: Replication failure, slave requesting old segments
Следующее
От: TalGloz
Дата:
Сообщение: Re: PostgreSQL C Language Extension with C++ Code