Re: backup manifests
От | Robert Haas |
---|---|
Тема | Re: backup manifests |
Дата | |
Msg-id | CA+TgmoawEeE5qpFgj5Vy2zZGKzd3ZSEhGrD_JdPqPd2GB8u1Cw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: backup manifests (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: backup manifests
|
Список | pgsql-hackers |
On Mon, Mar 30, 2020 at 2:59 PM Andres Freund <andres@anarazel.de> wrote: > I think it wouldn't be too hard to compute that information while taking > the base backup. We know the end timeline (ThisTimeLineID), so we can > just call readTimeLineHistory(ThisTimeLineID). Which should then allow > for something pretty trivial along the lines of > > timelines = readTimeLineHistory(ThisTimeLineID); > last_start = InvalidXLogRecPtr; > foreach(lc, timelines) > { > TimeLineHistoryEntry *he = lfirst(lc); > > if (he->end < startptr) > continue; > > // > manifest_emit_wal_range(Min(he->begin, startptr), he->end); > last_start = he->end; > } > > if (last_start == InvalidXlogRecPtr) > start = startptr; > else > start = last_start; > > manifest_emit_wal_range(start, entptr); I made an attempt to implement this. In the attached patch set, 0001 and 0002 are (I think) unmodified from the last version. 0003 is a slightly-rejiggered version of your new pg_waldump option. 0004 whacks 0002 around so that the WAL ranges are included in the manifest and pg_validatebackup tries to run pg_waldump for each WAL range. It appears to work in light testing, but I haven't yet (1) tested it extensively, (2) written good regression tests for it above and beyond what pg_validatebackup had already, or (3) updated the documentation. I'm going to work on those things. I would appreciate *very timely* feedback on anything people do or do not like about this, because I want to commit this patch set by the end of the work week and that isn't very far away. I would also appreciate if people would bear in mind the principle that half a loaf is better than none, and further improvements can be made in future releases. As part of my light testing, I tried promoting a standby that was running pg_basebackup, and found that pg_basebackup failed like this: pg_basebackup: error: could not get COPY data stream: ERROR: the standby was promoted during online backup HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup. pg_basebackup: removing data directory "/Users/rhaas/pgslave2" My first thought was that this error message is hard to reconcile with this comment: /* * Send timeline history files too. Only the latest timeline history * file is required for recovery, and even that only if there happens * to be a timeline switch in the first WAL segment that contains the * checkpoint record, or if we're taking a base backup from a standby * server and the target timeline changes while the backup is taken. * But they are small and highly useful for debugging purposes, so * better include them all, always. */ But then it occurred to me that this might be a cascading standby. Maybe the original master died and this machine's master got promoted, so it has to follow a timeline switch but doesn't itself get promoted. I think I might try to test out that scenario and see what happens, but I haven't done so as of this writing. Regardless, it seems like a really good idea to store a list of WAL ranges rather than a single start/end/timeline, because even if it's impossible today it might become possible in the future. Still, unless there's an easy way to set up a test scenario where multiple WAL ranges need to be verified, it may be hard to test that this code actually behaves properly. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: