Обсуждение: [HACKERS] Timeline ID in backup_label file
Hi all, Lately, in order to extract some information from a backup_label file in python I have found myself doing something like the following to get a timeline number (feel free to reuse that code, especially the regex pattern): pattern = re.compile('^START WAL LOCATION:.*([0-9A-F]+\/[0-9A-F]+).*\(file ([0-9A-F]+)\)') if pattern.match(backup_lable_line): current_lsn = m.group(1) current_segment = m.group(2) current_tli = str(int(current_segment[:8],16)) That's more or less similar to what read_backup_label()@xlog.c does using sscanf(), still I keep wondering why we need to go through this much complication to get the origin timeline number of a base backup, information which is IMO as important as the start LSN when working on backup methods. I would find interesting to add at the bottom of the backup_label file a new field called "START TIMELINE: %d" to put this information in a more understandable shape. Any opinions? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 26 October 2017 at 02:50, Michael Paquier <michael.paquier@gmail.com> wrote: > Hi all, > > Lately, in order to extract some information from a backup_label file > in python I have found myself doing something like the following to > get a timeline number (feel free to reuse that code, especially the > regex pattern): > pattern = re.compile('^START WAL > LOCATION:.*([0-9A-F]+\/[0-9A-F]+).*\(file ([0-9A-F]+)\)') > if pattern.match(backup_lable_line): > current_lsn = m.group(1) > current_segment = m.group(2) > current_tli = str(int(current_segment[:8], 16)) > > That's more or less similar to what read_backup_label()@xlog.c does > using sscanf(), still I keep wondering why we need to go through this > much complication to get the origin timeline number of a base backup, > information which is IMO as important as the start LSN when working on > backup methods. > > I would find interesting to add at the bottom of the backup_label file > a new field called "START TIMELINE: %d" to put this information in a > more understandable shape. Any opinions? Strong "yes" from me. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 10/25/17 10:10 PM, Craig Ringer wrote: > On 26 October 2017 at 02:50, Michael Paquier <michael.paquier@gmail.com> wrote: >> >> I would find interesting to add at the bottom of the backup_label file >> a new field called "START TIMELINE: %d" to put this information in a >> more understandable shape. Any opinions? > > Strong "yes" from me. +1 -- -David david@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 26, 2017 at 5:50 AM, David Steele <david@pgmasters.net> wrote: > On 10/25/17 10:10 PM, Craig Ringer wrote: >> On 26 October 2017 at 02:50, Michael Paquier <michael.paquier@gmail.com> wrote: >>> >>> I would find interesting to add at the bottom of the backup_label file >>> a new field called "START TIMELINE: %d" to put this information in a >>> more understandable shape. Any opinions? >> >> Strong "yes" from me. > > +1 Thanks for the feedback. Attached is a patch to achieve so, I have added as well a STOP TIMELINE field in the backup history file. Note that START TIMELINE gets automatically into the backup history file. Added a CF entry as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Hi Michael, Find my review below. On 10/26/17 2:03 PM, Michael Paquier wrote: > > Thanks for the feedback. Attached is a patch to achieve so, I have > added as well a STOP TIMELINE field in the backup history file. Note > that START TIMELINE gets automatically into the backup history file. > Added a CF entry as well. + TimeLineID tli1, tli2; I'm not that excited about these names but don't have any better ideas. + if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1) This didn't work when I tested it (I had intentionally munged the "START TIMELINE" to produce an error). The problem is the "START TIME" and "LABEL" lines which are not being read. I added a few fgets() calls and it worked. Thanks! -- -David david@pgmasters.net
On Wed, Nov 15, 2017 at 11:16 PM, David Steele <david@pgmasters.net> wrote: > Find my review below. > > On 10/26/17 2:03 PM, Michael Paquier wrote: >> >> Thanks for the feedback. Attached is a patch to achieve so, I have >> added as well a STOP TIMELINE field in the backup history file. Note >> that START TIMELINE gets automatically into the backup history file. >> Added a CF entry as well. > + TimeLineID tli1, tli2; > > I'm not that excited about these names but don't have any better ideas. One is tli_from_segname and the second is tli_from_file. Would something like that sound better? > + if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1) > > This didn't work when I tested it (I had intentionally munged the "START > TIMELINE" to produce an error). The problem is the "START TIME" and > "LABEL" lines which are not being read. I added a few fgets() calls and > it worked. Currently backup label files are generated as such: START WAL LOCATION: 0/2000028 (file 000000010000000000000002) CHECKPOINT LOCATION: 0/2000060 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2017-11-16 07:52:50 JST LABEL: popo START TIMELINE: 1 There could be two things that could be done here: 1) Keep read_backup_label order-dependent and look at the START TIME and LABEL fields, then kick in ereport(LOG) with the information. This makes the fields in the backup_label still order-dependent. 2) Make read_backup_label smarter by parsing it line-by-line and fill in the data wanted. This way the parsing logic and sanity checks are split into two, and Postgres is smarter at looking at backup_label files generated by any backup tool. Which one do you prefer? Getting input from backup tool maintainers is important here. 2) is more extensible if more fields are added to the backup_label for a reason or another in the future. -- Michael
On 11/15/17 6:01 PM, Michael Paquier wrote: > On Wed, Nov 15, 2017 at 11:16 PM, David Steele <david@pgmasters.net> wrote: >> Find my review below. >> >> On 10/26/17 2:03 PM, Michael Paquier wrote: >>> >>> Thanks for the feedback. Attached is a patch to achieve so, I have >>> added as well a STOP TIMELINE field in the backup history file. Note >>> that START TIMELINE gets automatically into the backup history file. >>> Added a CF entry as well. >> + TimeLineID tli1, tli2; >> >> I'm not that excited about these names but don't have any better ideas. > > One is tli_from_segname and the second is tli_from_file. Would > something like that sound better? Yes, those names are better. >> + if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1) >> >> This didn't work when I tested it (I had intentionally munged the "START >> TIMELINE" to produce an error). The problem is the "START TIME" and >> "LABEL" lines which are not being read. I added a few fgets() calls and >> it worked. > > Currently backup label files are generated as such: > START WAL LOCATION: 0/2000028 (file 000000010000000000000002) > CHECKPOINT LOCATION: 0/2000060 > BACKUP METHOD: streamed > BACKUP FROM: master > START TIME: 2017-11-16 07:52:50 JST > LABEL: popo > START TIMELINE: 1 > > There could be two things that could be done here: > 1) Keep read_backup_label order-dependent and look at the START TIME > and LABEL fields, then kick in ereport(LOG) with the information. This > makes the fields in the backup_label still order-dependent. > 2) Make read_backup_label smarter by parsing it line-by-line and fill > in the data wanted. This way the parsing logic and sanity checks are > split into two, and Postgres is smarter at looking at backup_label > files generated by any backup tool. > > Which one do you prefer? Getting input from backup tool maintainers is > important here. 2) is more extensible if more fields are added to the > backup_label for a reason or another in the future. For this patch at least, I think we should do #1. Getting rid of the order dependency is attractive but there may be other programs that are depending on the order. I know you are not proposing to change the order now, but it *could* be changed with #2. Also, I think DEBUG1 would be a more appropriate log level if any logging is done. Regards, -- -David david@pgmasters.net
On Thu, Nov 16, 2017 at 9:20 AM, David Steele <david@pgmasters.net> wrote: > For this patch at least, I think we should do #1. Getting rid of the order > dependency is attractive but there may be other programs that are depending > on the order. I know you are not proposing to change the order now, but it > *could* be changed with #2. read_backup_label() is a static function in the backend code. With #2 I do not imply to change the order of the elements written in the backup_label file, just to make the way they are parsed smarter. > Also, I think DEBUG1 would be a more appropriate log level if any logging is > done. OK, the label and time strings can include spaces. The label string is assumed to not be bigger than 1028 bytes, and the timestamp is assumed that it can get up to 128. So it is possible to use stuff like fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the backend. If a backup_label is generated with strings longer than that, then read_backup_label would fail to parse the next entries but that's not really something to worry about IMO as those are only minor sanity checks. Having a safer coding in the backend is more important, and the new checks should not trigger any hard failures. -- Michael
Вложения
Hi Michael, On 11/15/17 10:09 PM, Michael Paquier wrote: > On Thu, Nov 16, 2017 at 9:20 AM, David Steele <david@pgmasters.net> wrote: >> For this patch at least, I think we should do #1. Getting rid of the order >> dependency is attractive but there may be other programs that are depending >> on the order. I know you are not proposing to change the order now, but it >> *could* be changed with #2. > > read_backup_label() is a static function in the backend code. With #2 > I do not imply to change the order of the elements written in the > backup_label file, just to make the way they are parsed smarter. My point is that the order *could* be changed and it wouldn't be noticed if the read function were improved as you propose. I'm not against the idea, just think we should continue to enforce the order unless we decide an interface break is OK. >> Also, I think DEBUG1 would be a more appropriate log level if any logging is >> done. > > OK, the label and time strings can include spaces. The label string is > assumed to not be bigger than 1028 bytes, and the timestamp is assumed > that it can get up to 128. So it is possible to use stuff like > fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the > backend. If a backup_label is generated with strings longer than that, > then read_backup_label would fail to parse the next entries but that's > not really something to worry about IMO as those are only minor sanity > checks. Having a safer coding in the backend is more important, and > the new checks should not trigger any hard failures. I have tested and get an error as expected when I munge the backup_label file: FATAL: invalid data in file "backup_label" DETAIL: Timeline ID parsed is 2, but expected 1 Everything else looks good so I will mark it ready for committer. Regards, -- -David david@pgmasters.net
On Mon, Nov 27, 2017 at 11:06 PM, David Steele <david@pgmasters.net> wrote: > On 11/15/17 10:09 PM, Michael Paquier wrote: >> On Thu, Nov 16, 2017 at 9:20 AM, David Steele <david@pgmasters.net> wrote: >>> For this patch at least, I think we should do #1. Getting rid of the order >>> dependency is attractive but there may be other programs that are depending >>> on the order. I know you are not proposing to change the order now, but it >>> *could* be changed with #2. >> >> read_backup_label() is a static function in the backend code. With #2 >> I do not imply to change the order of the elements written in the >> backup_label file, just to make the way they are parsed smarter. > > My point is that the order *could* be changed and it wouldn't be noticed > if the read function were improved as you propose. > > I'm not against the idea, just think we should continue to enforce the > order unless we decide an interface break is OK. I still don't quite understand here. The order the items are read does not cause a backward-incompatible change. True that there is no reason to change that either now. >>> Also, I think DEBUG1 would be a more appropriate log level if any logging is >>> done. >> >> OK, the label and time strings can include spaces. The label string is >> assumed to not be bigger than 1028 bytes, and the timestamp is assumed >> that it can get up to 128. So it is possible to use stuff like >> fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the >> backend. If a backup_label is generated with strings longer than that, >> then read_backup_label would fail to parse the next entries but that's >> not really something to worry about IMO as those are only minor sanity >> checks. Having a safer coding in the backend is more important, and >> the new checks should not trigger any hard failures. > > I have tested and get an error as expected when I munge the backup_label > file: > > FATAL: invalid data in file "backup_label" > DETAIL: Timeline ID parsed is 2, but expected 1 > > Everything else looks good so I will mark it ready for committer. Thanks. This maps what I saw. -- Michael
On 11/27/17 7:11 PM, Michael Paquier wrote: > On Mon, Nov 27, 2017 at 11:06 PM, David Steele <david@pgmasters.net> wrote: >> On 11/15/17 10:09 PM, Michael Paquier wrote: >>> >>> read_backup_label() is a static function in the backend code. With #2 >>> I do not imply to change the order of the elements written in the >>> backup_label file, just to make the way they are parsed smarter. >> >> My point is that the order *could* be changed and it wouldn't be noticed >> if the read function were improved as you propose. >> >> I'm not against the idea, just think we should continue to enforce the >> order unless we decide an interface break is OK. > > I still don't quite understand here. The order the items are read does > not cause a backward-incompatible change. True that there is no reason > to change that either now. Perhaps I'm the one who is misunderstanding. If you propose a patch I'll be happy to review it, though I doubt there is a lot to be gained even if it would be a better implementation. Regards, -- -David david@pgmasters.net
On Tue, Nov 28, 2017 at 9:40 AM, David Steele <david@pgmasters.net> wrote: > Perhaps I'm the one who is misunderstanding. If you propose a patch I'll be > happy to review it, though I doubt there is a lot to be gained even if it > would be a better implementation. OK. I'll keep that in mind. Thanks for the input. -- Michael
On 27 November 2017 at 14:06, David Steele <david@pgmasters.net> wrote: > I have tested and get an error as expected when I munge the backup_label > file: > > FATAL: invalid data in file "backup_label" > DETAIL: Timeline ID parsed is 2, but expected 1 > > Everything else looks good so I will mark it ready for committer. Sounds good. No tests? No docs/extended explanatory comments? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 5, 2018 at 11:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 27 November 2017 at 14:06, David Steele <david@pgmasters.net> wrote: > >> I have tested and get an error as expected when I munge the backup_label >> file: >> >> FATAL: invalid data in file "backup_label" >> DETAIL: Timeline ID parsed is 2, but expected 1 >> >> Everything else looks good so I will mark it ready for committer. > > Sounds good. Thanks for the feedback. > No tests? Do you think that's worth the cycles as a TAP test? This can be done by generating a dummy backup_label file and then start a standby to see the failure, but this looks quite costly for minimum gain. This code path is also taken tested in src/test/recovery/t/001_stream_rep.pl for example, though that's not the failure path. So if we add a DEBUG1 line after fetching correctly the timeline ID this could help by looking at tmp_check/log/001_stream_rep_standby_1.log, but this needs to set "log_min_messages = debug1" PostgresNode.pm for example so as this shows up in the logs (this could be useful if done by default actually, DEBUG1 is not too talkative). Attached is an example of patch doing so. See for the addition in PostgresNode.pm and this new bit: + ereport(DEBUG1, + (errmsg("backup timeline %u in file \"%s\"", + tli_from_file, BACKUP_LABEL_FILE))); > No docs/extended explanatory comments? There is no existing documentation about the format of the backup_file in doc/. So it seems to me that it is not the problem of this patch to fill in the hole. Regarding the comments, perhaps something better could be done, but I am not quite sure what. We don't much explain what the current fields mean, except that one can guess what they mean from their names, which is the intention behind the code. -- Michael
Вложения
On Sat, Jan 6, 2018 at 9:54 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Jan 5, 2018 at 11:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 27 November 2017 at 14:06, David Steele <david@pgmasters.net> wrote: >> >>> I have tested and get an error as expected when I munge the backup_label >>> file: >>> >>> FATAL: invalid data in file "backup_label" >>> DETAIL: Timeline ID parsed is 2, but expected 1 >>> >>> Everything else looks good so I will mark it ready for committer. >> >> Sounds good. > > Thanks for the feedback. Previous patch was incorrect, this was the same v2 as upthread. Attached is the correct v3. -- Michael
Вложения
On Sat, Jan 06, 2018 at 09:56:03AM +0900, Michael Paquier wrote: > Previous patch was incorrect, this was the same v2 as upthread. > Attached is the correct v3. For the archive's sake, this version of the patch has been pushed as 6271fce. Thanks Simon for the commit, and David for the review! -- Michael