Re: Replication failure, slave requesting old segments
От | Adrian Klaver |
---|---|
Тема | Re: Replication failure, slave requesting old segments |
Дата | |
Msg-id | 444ada2d-8896-cd74-57dd-531999190182@aklaver.com обсуждение исходный текст |
Ответ на | Re: Replication failure, slave requesting old segments ("Phil Endecott" <spam_from_pgsql_lists@chezphil.org>) |
Список | pgsql-general |
On 08/12/2018 12:53 PM, Phil Endecott wrote: > Phil Endecott wrote: >> On the master, I have: >> >> wal_level = replica >> archive_mode = on >> archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f && >> scp %p backup:backup/postgresql/archivedir/%f' >> >> On the slave I have: >> >> standby_mode = 'on' >> primary_conninfo = 'user=postgres host=master port=5432' >> restore_command = 'scp backup:backup/postgresql/archivedir/%f %p' >> >> hot_standby = on > >> 2018-08-11 00:05:50.364 UTC [615] LOG: restored log file "0000000100000007000000D0" from archive >> scp: backup/postgresql/archivedir/0000000100000007000000D1: No such file or directory >> 2018-08-11 00:05:51.325 UTC [7208] LOG: started streaming WAL from primary at 7/D0000000 on timeline 1 >> 2018-08-11 00:05:51.325 UTC [7208] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 0000000100000007000000D0has already been removed > > > I am wondering if I need to set wal_keep_segments to at least 1 or 2 for > this to work. I currently have it unset and I believe the default is 0. Given that WAL's are only 16 MB I would probably bump it up to be on safe side, or use: https://www.postgresql.org/docs/9.6/static/warm-standby.html 26.2.6. Replication Slots Though the above does not limit storage of WAL's, so a long outage could result in WAL's piling up. > > My understanding was that when using archive_command/restore_command to copy > WAL segments it would not be necessary to use wal_keep_segments to retain > files in pg_xlog on the server; the slave can get everything using a > combination of copying files using the restore_command and streaming. > But these lines from the log: > > 2018-08-11 00:12:15.797 UTC [7954] LOG: redo starts at 7/D0F956C0 > 2018-08-11 00:12:16.068 UTC [7954] LOG: consistent recovery state reached at 7/D0FFF088 > > make me think that there is an issue when the slave reaches the end of the > copied WAL file. I speculate that the useful content of this WAL segment > ends at FFF088, which is followed by an empty gap due to record sizes. But > the slave tries to start streaming from this point, D0FFF088, not D1000000. > If the master still had a copy of segment D0 then it would be able to stream > this gap followed by the real content in the current segment D1. > > Does that make any sense at all? > > > Regards, Phil. > > > > -- Adrian Klaver adrian.klaver@aklaver.com
В списке pgsql-general по дате отправления: