Re: Cascading replication and recovery_target_timeline='latest'
От | Heikki Linnakangas |
---|---|
Тема | Re: Cascading replication and recovery_target_timeline='latest' |
Дата | |
Msg-id | 50469953.1070603@iki.fi обсуждение исходный текст |
Ответ на | Re: Cascading replication and recovery_target_timeline='latest' (Heikki Linnakangas <hlinnaka@iki.fi>) |
Список | pgsql-hackers |
On 03.09.2012 17:40, Heikki Linnakangas wrote: > On 03.09.2012 16:26, Heikki Linnakangas wrote: >> On 03.09.2012 16:25, Fujii Masao wrote: >>> On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangas<hlinnaka@iki.fi> >>> wrote: >>>> Hmm, I was thinking that when walsender gets the position it can send >>>> the >>>> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the >>>> current >>>> recovery timeline. If it has changed, refuse to send the new WAL and >>>> terminate. That would be a fairly small change, it would just close the >>>> window between requesting walsenders to terminate and them actually >>>> terminating. >>> >>> Yeah, sounds good. Could you implement the patch? If you don't have >>> time, >>> I will.... >> >> I'll give it a shot.. > > So, this is what I came up with, please review. While testing, I bumped into another related bug: When a WAL segment is restored from the archive, we let a walsender to send that whole WAL segment to a cascading standby. However, there's no guarantee that the restored WAL segment is complete. In particular, if a timeline changes within that segment, e.g 000000010000000000000004, that segment will be only partially full, and the WAL continues at segment 000000020000000000000004, at the next timeline. This can also happen if you copy a partial WAL segment to the archive, for example from a crashed master server. Or if you have set up record-based WAL shipping not using streaming replication, per http://www.postgresql.org/docs/devel/static/log-shipping-alternative.html#WARM-STANDBY-RECORD. That manual page says you can only deal with whole WAL files that way, but I think with standby_mode='on', that's actually no longer true. So all in all, it seems like a shaky assumption that once you've restored a WAL file from the archive, you're free to stream it to a cascading slave. I think it would be more robust to limit it to streaming the file only up to the point that it's been replayed - and thus verified - in the 1st standby. If everyone is OK with that change in behavior, the fix is simple. - Heikki
В списке pgsql-hackers по дате отправления: