pg_upgrade (and recovery) pitfalls

Поиск
Список
Период
Сортировка
От PO
Тема pg_upgrade (and recovery) pitfalls
Дата
Msg-id 20180816154110.Horde.KE3OkcY8--6IOgbaQ37A8gt@nextcloud.pro-open.de
обсуждение исходный текст
Ответы Re: pg_upgrade (and recovery) pitfalls
Список pgsql-general
Hello -general!

(didn't want to pester -hackers with this, as I don't know if I've been doing something terribly wrong ;-)


Consider the following scenario/setup:
- 4 DB servers in 2 DCs
  - 1 primary (in DC1)
  - 1 sync secondary (in other DC)
  - 2 async secondaries (distributed over DCs)
  - 1 server running BARMAN for backups/archiving

- Puppet takes care of the PG config, recovery.conf etc.

- ssh connections are only allowed and enabled to and from the BARMAN host/user, no ssh between the DB hosts.

- as we switch between servers pretty often, BARMAN provides a single archive for all 4 servers. Only the primary does
WALarchiving.
 

- General requirements are:
  - *always* have one sync secondary online (no exceptions)
  - best possible availability ==> shortest possible downtime



The approach for major upgrades is to upgrade 2 servers and see if everything works as expected, then 
- either follow up with the remaining secondaries or
- promote one of them and carry on with the old version

Since we always have 2 spare secondaries to fall back to, pg_upgrade in "--link" mode is a viable approach, as it's
niceand fast.
 


The "naive" idea was to shutdown all instances (starting with the primary to enable final syncs), run "pg_upgrade -k"
onboth the former primary and the former sync secondary, re-link the recovery.conf on the secondary, re-enable the
"primary"IP and start both.
 

D'oh! The secondary is complaining about a different cluster identifier:
"FATAL:  database system identifier differs between the primary and standby"

(From looking at the code, I could not determine straight away when and how this identifier is generated, but I guess
itis somewhere in initdb.c?)
 

So, as we can't rsync (no ssh...), which would probably finish in a few seconds, a pg_basebackup is due. Which can be a
PITAwhen the DB size is scraping on a TB and you have a single 1GB/sec connection. Bye, bye, availability (remember the
primaryrequirement?).
 

==> So by now, we're only pg_upgrade'ing the primary and follow up with a pg_basebackup to the secondary, planning for
muchlonger downtimes. <==
 


After finishing the pg_basebackup, re-link the recovery.conf, start. 
The recovery finds history-files from higher timelines in the archive, starts to recover those (?) and then complains
thatthe timeline doesn't match (don't take the numbers here too seriously, this is from a low-traffic test system, the
forkoff TL 1 was at least a year ago):
 

restored log file "00000002.history" from archive
restored log file "00000003.history" from archive
restored log file "00000004.history" from archive
FATAL:  requested timeline 3 is not a child of this server's history
DETAIL:  Latest checkpoint is at 9C/36044D28 on timeline 1, but in the history of the requested timeline, the server
forkedoff from that timeline at 69/88000000.
 

This mess can probably be cleaned up manually (delete the 000000[234].history etc. on both the secondary and the BARMAN
archive),however to be 100% safe (or when you're unexperienced), you take another basebackup :/
 

And - after moving the *.history files out of the way in the archive - the secondary finally starts and starts
receivingWALs.
 

Sidenote: this second problem happened to us as well after a colleague promoted a secondary for some R/W tests w/out
firstdisabling the archive_command in the morning and then re-built it using pg_basebackup in the afternoon.
 
I have to say that it took me quite a while to figure out what was going on there... sudden timeline jumps, presumably
outof nowhere?!?
 


Now, questions/wishes:
- did I do something obvious(ly) wrong?

- why does a recovery, based on a recovery.conf that points to a reachable primary (which obviously communicates its
owntimeline), still look for higher timelines' history-files in the archive and tries to jump onto these timelines?
Thisdoesn't seem reasoable to me at all...
 

- is there a way to have pg_upgrade/initdb use a particular "database system identifier" or some kind of "IV" for the
newinstance, allowing for identical upgrades on primary and secondary? Apart from that number (and the system tables'
OIDs),the data directories should be identical, as far as I understand pg_upgrade's functionality? 
 
  (- and where do I find that identifier?)

- is there a way to have pg_upgrade/initdb initiate the new cluster on the same (or a higher) timeline, to prevent
f***ingup the archive? If not, I'd certainly appreciate such an option! (I'm well aware that I need a new basebackup
afterthe upgrade anyway, but alas!)
 

- any other hints (apart from rsync or other ssh-based methods) how I could get around the pg_basebackup would be
highlyappreciated. Something that only clones the system tables and not the bulk of the (identical) data files?
 
- is my situation (mandantory sync secondary) so unusual that nobody ever thought about the above? ;-)

Cheers & best regards,

Gunnar "Nick" Bluth


В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Sv: Re: Sv: Re: regex match and special characters
Следующее
От: "Phil Endecott"
Дата:
Сообщение: Re: During promotion, new master tries to archive same segmenttwice