BUG #10142: Downstream standby indefinitely waits for an old WAL log in new timeline on WAL Cascading replicatio
От | skeefe@rdx.com |
---|---|
Тема | BUG #10142: Downstream standby indefinitely waits for an old WAL log in new timeline on WAL Cascading replicatio |
Дата | |
Msg-id | 20140425174336.2721.61539@wrigleys.postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #10142: Downstream standby indefinitely waits for
an old WAL log in new timeline on WAL Cascading replicatio
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 10142 Logged by: Sean Keefe Email address: skeefe@rdx.com PostgreSQL version: 9.2.8 Operating system: Redhat 6.4 Description: The issues that we are experiencing is with Postgres 9.2.8 Cascading WAL Replication. If the master goes down during a massive transaction and we promote the first slave then next slave looks for a WAL log that never existed, New timeline before the split of timelines. Below is how to re create the issue: 1. Create M using postgresql.conf_M. Start M. CREATE TABLE t_test (id int4); 2. Create S1 from M using postgresql.conf_S1 and recovery.conf_S1 (I used rsync). Start S1 3. Create S2 from M using postgresql.conf_S2 and recovery.conf_S2 (I used rsync). Start S2 4. Insert data in t_test table in M INSERT INTO t_test SELECT * FROM generate_series(1, 250000) ; 5. Important: Do not shutdown M. If you want you can crash M by killing pids. I just let it run and immediately proceeded to next step. The idea here is to promote S1 before M transmits the last WAL which has the COMMIT of the above INSERT. 6. Promote S1. S1 will change its timeline. 7. S2 will not recognize the new timeline of its master S1. PGSTOP S2 and then PGSTART. S2 will now change its timeline. However, as you see in the pg_log, it will wait for a WAL that will never arrive. It will look for WALs from previous timeline in new timeline file naming format. E.g it will wait for 0000000A00000026000000F1. You will see that such log exists in the name 0000000900000026000000F1. So it will wait forever and if you try to connect to S2 you will see error âFATAL: the database system is starting upâ Recovery.conf for S1: restore_command = '/data/postgres/rep_poc/restore_command.sh %f %p %r' recovery_end_command = 'rm -f /data/postgres/rep_poc/trigger.cfg' recovery_target_timeline = 'latest' recovery.conf for S2: restore_command = '/data/postgres/rep_poc/restore_command.sh %f %p %r' recovery_end_command = 'rm -f /data/postgres/rep_poc/trigger.cfg' recovery_target_timeline = 'latest' If you need any of the other configuration files let me know and i can send them to you.
В списке pgsql-bugs по дате отправления: