Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman
От | Michael Paquier |
---|---|
Тема | Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman |
Дата | |
Msg-id | YlT23IvsXkGuLzFi@paquier.xyz обсуждение исходный текст |
Ответ на | Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman
Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman |
Список | pgsql-hackers |
On Mon, Apr 11, 2022 at 06:48:58PM +1200, Thomas Munro wrote: > 1. This test had some pre-existing bugs/races, which hadn't failed > before due to scheduling, even under Valgrind. The above changes > appear to fix those problems. To Michael for comment. Yeah, there are two problems here. From what I can see, ensuring the execution of archive_cleanup_command on the standby needs the checkpoint on the primary and the restart point on the standby. So pg_current_wal_lsn() should be located after the primary's checkpoint and not before it so as we are sure that the checkpoint records finds its way to the standby. That's what Tom mentioned upthread. The second problem is to make sure that $standby2 sees the promotion of $standby and its history file, but we also want to recover 00000002.history from some archives to create a RECOVERYHISTORY at recovery for the purpose of the test. Switching to a new segment as proposed by Andres does not seem completely right to me because we are not 100% sure of the ordering an archive is going to happen, no? I think that the logic to create $standby2 from the initial backup of the primary is right, because there is no 00000002.history in it, but we also need to be sure that 00000002.history has been archived once the promotion of $standby is done. This can be validated thanks to the logs, actually. >> What is that second test really testing? >> >> # Check the presence of temporary files specifically generated during >> # archive recovery. To ensure the presence of the temporary history >> # file, switch to a timeline large enough to allow a standby to recover >> # a history file from an archive. As this requires at least two timeline >> # switches, promote the existing standby first. Then create a second >> # standby based on the promoted one. Finally, the second standby is >> # promoted. >> >> Note "Then create a second standby based on the promoted one." - but that's >> not actually what's happening: > > 2. There may also be other problems with the test but those aren't > relevant to skink's failure, which starts on the 5th test. To Michael > for comment. This comes from df86e52, where we want to recovery a history file that would be created as RECOVERYHISTORY and make sure that the file gets removed at the end of recovery. So $standby2 should choose a new timeline different from the one of chosen by $standby. Looking back at what has been done, it seems to me that the comment is the incorrect part: https://www.postgresql.org/message-id/20190930080340.GO2888@paquier.xyz All that stuff leads me to the attached. Thoughts? -- Michael
Вложения
В списке pgsql-hackers по дате отправления: