Re: A failure of standby to follow timeline switch
От | Fujii Masao |
---|---|
Тема | Re: A failure of standby to follow timeline switch |
Дата | |
Msg-id | 697adab0-a3fe-e1cb-436b-3a8eaa9a2266@oss.nttdata.com обсуждение исходный текст |
Ответ на | A failure of standby to follow timeline switch (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Ответы |
Re: A failure of standby to follow timeline switch
|
Список | pgsql-hackers |
On 2020/12/09 17:43, Kyotaro Horiguchi wrote: > Hello. > > We found a behavioral change (which seems to be a bug) in recovery at > PG13. > > The following steps might seem somewhat strange but the replication > code deliberately cope with the case. This is a sequense seen while > operating a HA cluseter using Pacemaker. > > - Run initdb to create a primary. > - Set archive_mode=on on the primary. > - Start the primary. > > - Create a standby using pg_basebackup from the primary. > - Stop the standby. > - Stop the primary. > > - Put stnadby.signal to the primary then start it. > - Promote the primary. > > - Start the standby. > > > Until PG12, the parimary signals end-of-timeline to the standby and > switches to the next timeline. Since PG13, that doesn't happen and > the standby continues to request for the segment of the older > timeline, which no longer exists. > > FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000003 has already beenremoved > > It is because WalSndSegmentOpen() can fail to detect a timeline switch > on a historic timeline, due to use of a wrong variable to check > that. It is using state->seg.ws_segno but it seems to be a thinko when > the code around was refactored in 709d003fbd. > > The first patch detects the wrong behavior. The second small patch > fixes it. Thanks for reporting this! This looks like a bug. When I applied two patches in the master branch and ran "make check-world", I got the following error. ============== creating database "contrib_regression" ============== # Looks like you planned 37 tests but ran 36. # Looks like your test exited with 255 just after 36. t/001_stream_rep.pl .................. Dubious, test returned 255 (wstat 65280, 0xff00) Failed 1/37 subtests ... Test Summary Report ------------------- t/001_stream_rep.pl (Wstat: 65280 Tests: 36 Failed: 0) Non-zero exit status: 255 Parse errors: Bad plan. You planned 37 tests but ran 36. Files=21, Tests=239, 302 wallclock secs ( 0.10 usr 0.05 sys + 41.69 cusr 39.84 csys = 81.68 CPU) Result: FAIL make[2]: *** [check] Error 1 make[1]: *** [check-recovery-recurse] Error 2 make[1]: *** Waiting for unfinished jobs.... t/070_dropuser.pl ......... ok Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
В списке pgsql-hackers по дате отправления: