Обсуждение: [HACKERS] Timeline ID in backup_label file

Поиск
Список
Период
Сортировка

[HACKERS] Timeline ID in backup_label file

От
Michael Paquier
Дата:
Hi all,

Lately, in order to extract some information from a backup_label file
in python I have found myself doing something like the following to
get a timeline number (feel free to reuse that code, especially the
regex pattern):
pattern = re.compile('^START WAL
LOCATION:.*([0-9A-F]+\/[0-9A-F]+).*\(file ([0-9A-F]+)\)')
if pattern.match(backup_lable_line):   current_lsn = m.group(1)   current_segment = m.group(2)   current_tli =
str(int(current_segment[:8],16))
 

That's more or less similar to what read_backup_label()@xlog.c does
using sscanf(), still I keep wondering why we need to go through this
much complication to get the origin timeline number of a base backup,
information which is IMO as important as the start LSN when working on
backup methods.

I would find interesting to add at the bottom of the backup_label file
a new field called "START TIMELINE: %d" to put this information in a
more understandable shape. Any opinions?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeline ID in backup_label file

От
Craig Ringer
Дата:
On 26 October 2017 at 02:50, Michael Paquier <michael.paquier@gmail.com> wrote:
> Hi all,
>
> Lately, in order to extract some information from a backup_label file
> in python I have found myself doing something like the following to
> get a timeline number (feel free to reuse that code, especially the
> regex pattern):
> pattern = re.compile('^START WAL
> LOCATION:.*([0-9A-F]+\/[0-9A-F]+).*\(file ([0-9A-F]+)\)')
> if pattern.match(backup_lable_line):
>     current_lsn = m.group(1)
>     current_segment = m.group(2)
>     current_tli = str(int(current_segment[:8], 16))
>
> That's more or less similar to what read_backup_label()@xlog.c does
> using sscanf(), still I keep wondering why we need to go through this
> much complication to get the origin timeline number of a base backup,
> information which is IMO as important as the start LSN when working on
> backup methods.
>
> I would find interesting to add at the bottom of the backup_label file
> a new field called "START TIMELINE: %d" to put this information in a
> more understandable shape. Any opinions?

Strong "yes" from me.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeline ID in backup_label file

От
David Steele
Дата:
On 10/25/17 10:10 PM, Craig Ringer wrote:
> On 26 October 2017 at 02:50, Michael Paquier <michael.paquier@gmail.com> wrote:
>>
>> I would find interesting to add at the bottom of the backup_label file
>> a new field called "START TIMELINE: %d" to put this information in a
>> more understandable shape. Any opinions?
> 
> Strong "yes" from me.

+1

-- 
-David
david@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeline ID in backup_label file

От
Michael Paquier
Дата:
On Thu, Oct 26, 2017 at 5:50 AM, David Steele <david@pgmasters.net> wrote:
> On 10/25/17 10:10 PM, Craig Ringer wrote:
>> On 26 October 2017 at 02:50, Michael Paquier <michael.paquier@gmail.com> wrote:
>>>
>>> I would find interesting to add at the bottom of the backup_label file
>>> a new field called "START TIMELINE: %d" to put this information in a
>>> more understandable shape. Any opinions?
>>
>> Strong "yes" from me.
>
> +1

Thanks for the feedback. Attached is a patch to achieve so, I have
added as well a STOP TIMELINE field in the backup history file. Note
that START TIMELINE gets automatically into the backup history file.
Added a CF entry as well.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Timeline ID in backup_label file

От
David Steele
Дата:
Hi Michael,

Find my review below.

On 10/26/17 2:03 PM, Michael Paquier wrote:
>
> Thanks for the feedback. Attached is a patch to achieve so, I have
> added as well a STOP TIMELINE field in the backup history file. Note
> that START TIMELINE gets automatically into the backup history file.
> Added a CF entry as well.
+    TimeLineID    tli1, tli2;

I'm not that excited about these names but don't have any better ideas.

+    if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1)

This didn't work when I tested it (I had intentionally munged the "START
TIMELINE" to produce an error).  The problem is the "START TIME" and
"LABEL" lines which are not being read.  I added a few fgets() calls and
it worked.

Thanks!
-- 
-David
david@pgmasters.net


Re: [HACKERS] Timeline ID in backup_label file

От
Michael Paquier
Дата:
On Wed, Nov 15, 2017 at 11:16 PM, David Steele <david@pgmasters.net> wrote:
> Find my review below.
>
> On 10/26/17 2:03 PM, Michael Paquier wrote:
>>
>> Thanks for the feedback. Attached is a patch to achieve so, I have
>> added as well a STOP TIMELINE field in the backup history file. Note
>> that START TIMELINE gets automatically into the backup history file.
>> Added a CF entry as well.
> +       TimeLineID      tli1, tli2;
>
> I'm not that excited about these names but don't have any better ideas.

One is tli_from_segname and the second is tli_from_file. Would
something like that sound better?

> +       if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1)
>
> This didn't work when I tested it (I had intentionally munged the "START
> TIMELINE" to produce an error).  The problem is the "START TIME" and
> "LABEL" lines which are not being read.  I added a few fgets() calls and
> it worked.

Currently backup label files are generated as such:
START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/2000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2017-11-16 07:52:50 JST
LABEL: popo
START TIMELINE: 1

There could be two things that could be done here:
1) Keep read_backup_label order-dependent and look at the START TIME
and LABEL fields, then kick in ereport(LOG) with the information. This
makes the fields in the backup_label still order-dependent.
2) Make read_backup_label smarter by parsing it line-by-line and fill
in the data wanted. This way the parsing logic and sanity checks are
split into two, and Postgres is smarter at looking at backup_label
files generated by any backup tool.

Which one do you prefer? Getting input from backup tool maintainers is
important here. 2) is more extensible if more fields are added to the
backup_label for a reason or another in the future.
-- 
Michael


Re: [HACKERS] Timeline ID in backup_label file

От
David Steele
Дата:
On 11/15/17 6:01 PM, Michael Paquier wrote:
> On Wed, Nov 15, 2017 at 11:16 PM, David Steele <david@pgmasters.net> wrote:
>> Find my review below.
>>
>> On 10/26/17 2:03 PM, Michael Paquier wrote:
>>>
>>> Thanks for the feedback. Attached is a patch to achieve so, I have
>>> added as well a STOP TIMELINE field in the backup history file. Note
>>> that START TIMELINE gets automatically into the backup history file.
>>> Added a CF entry as well.
>> +       TimeLineID      tli1, tli2;
>>
>> I'm not that excited about these names but don't have any better ideas.
> 
> One is tli_from_segname and the second is tli_from_file. Would
> something like that sound better?

Yes, those names are better.

>> +       if (fscanf(lfp, "START TIMELINE: %u\n", &tli2) == 1)
>>
>> This didn't work when I tested it (I had intentionally munged the "START
>> TIMELINE" to produce an error).  The problem is the "START TIME" and
>> "LABEL" lines which are not being read.  I added a few fgets() calls and
>> it worked.
> 
> Currently backup label files are generated as such:
> START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
> CHECKPOINT LOCATION: 0/2000060
> BACKUP METHOD: streamed
> BACKUP FROM: master
> START TIME: 2017-11-16 07:52:50 JST
> LABEL: popo
> START TIMELINE: 1
> 
> There could be two things that could be done here:
> 1) Keep read_backup_label order-dependent and look at the START TIME
> and LABEL fields, then kick in ereport(LOG) with the information. This
> makes the fields in the backup_label still order-dependent.
> 2) Make read_backup_label smarter by parsing it line-by-line and fill
> in the data wanted. This way the parsing logic and sanity checks are
> split into two, and Postgres is smarter at looking at backup_label
> files generated by any backup tool.
> 
> Which one do you prefer? Getting input from backup tool maintainers is
> important here. 2) is more extensible if more fields are added to the
> backup_label for a reason or another in the future.

For this patch at least, I think we should do #1.  Getting rid of the 
order dependency is attractive but there may be other programs that are 
depending on the order.  I know you are not proposing to change the 
order now, but it *could* be changed with #2.

Also, I think DEBUG1 would be a more appropriate log level if any 
logging is done.

Regards,
-- 
-David
david@pgmasters.net


Re: [HACKERS] Timeline ID in backup_label file

От
Michael Paquier
Дата:
On Thu, Nov 16, 2017 at 9:20 AM, David Steele <david@pgmasters.net> wrote:
> For this patch at least, I think we should do #1.  Getting rid of the order
> dependency is attractive but there may be other programs that are depending
> on the order.  I know you are not proposing to change the order now, but it
> *could* be changed with #2.

read_backup_label() is a static function in the backend code. With #2
I do not imply to change the order of the elements written in the
backup_label file, just to make the way they are parsed smarter.

> Also, I think DEBUG1 would be a more appropriate log level if any logging is
> done.

OK, the label and time strings can include spaces. The label string is
assumed to not be bigger than 1028 bytes, and the timestamp is assumed
that it can get up to 128. So it is possible to use stuff like
fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the
backend. If a backup_label is generated with strings longer than that,
then read_backup_label would fail to parse the next entries but that's
not really something to worry about IMO as those are only minor sanity
checks. Having a safer coding in the backend is more important, and
the new checks should not trigger any hard failures.
-- 
Michael

Вложения

Re: [HACKERS] Timeline ID in backup_label file

От
David Steele
Дата:
Hi Michael,

On 11/15/17 10:09 PM, Michael Paquier wrote:
> On Thu, Nov 16, 2017 at 9:20 AM, David Steele <david@pgmasters.net> wrote:
>> For this patch at least, I think we should do #1.  Getting rid of the order
>> dependency is attractive but there may be other programs that are depending
>> on the order.  I know you are not proposing to change the order now, but it
>> *could* be changed with #2.
> 
> read_backup_label() is a static function in the backend code. With #2
> I do not imply to change the order of the elements written in the
> backup_label file, just to make the way they are parsed smarter.

My point is that the order *could* be changed and it wouldn't be noticed
if the read function were improved as you propose.

I'm not against the idea, just think we should continue to enforce the
order unless we decide an interface break is OK.

>> Also, I think DEBUG1 would be a more appropriate log level if any logging is
>> done.
> 
> OK, the label and time strings can include spaces. The label string is
> assumed to not be bigger than 1028 bytes, and the timestamp is assumed
> that it can get up to 128. So it is possible to use stuff like
> fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the
> backend. If a backup_label is generated with strings longer than that,
> then read_backup_label would fail to parse the next entries but that's
> not really something to worry about IMO as those are only minor sanity
> checks. Having a safer coding in the backend is more important, and
> the new checks should not trigger any hard failures.

I have tested and get an error as expected when I munge the backup_label
file:

FATAL:  invalid data in file "backup_label"
DETAIL:  Timeline ID parsed is 2, but expected 1

Everything else looks good so I will mark it ready for committer.

Regards,
-- 
-David
david@pgmasters.net


Re: [HACKERS] Timeline ID in backup_label file

От
Michael Paquier
Дата:
On Mon, Nov 27, 2017 at 11:06 PM, David Steele <david@pgmasters.net> wrote:
> On 11/15/17 10:09 PM, Michael Paquier wrote:
>> On Thu, Nov 16, 2017 at 9:20 AM, David Steele <david@pgmasters.net> wrote:
>>> For this patch at least, I think we should do #1.  Getting rid of the order
>>> dependency is attractive but there may be other programs that are depending
>>> on the order.  I know you are not proposing to change the order now, but it
>>> *could* be changed with #2.
>>
>> read_backup_label() is a static function in the backend code. With #2
>> I do not imply to change the order of the elements written in the
>> backup_label file, just to make the way they are parsed smarter.
>
> My point is that the order *could* be changed and it wouldn't be noticed
> if the read function were improved as you propose.
>
> I'm not against the idea, just think we should continue to enforce the
> order unless we decide an interface break is OK.

I still don't quite understand here. The order the items are read does
not cause a backward-incompatible change. True that there is no reason
to change that either now.

>>> Also, I think DEBUG1 would be a more appropriate log level if any logging is
>>> done.
>>
>> OK, the label and time strings can include spaces. The label string is
>> assumed to not be bigger than 1028 bytes, and the timestamp is assumed
>> that it can get up to 128. So it is possible to use stuff like
>> fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the
>> backend. If a backup_label is generated with strings longer than that,
>> then read_backup_label would fail to parse the next entries but that's
>> not really something to worry about IMO as those are only minor sanity
>> checks. Having a safer coding in the backend is more important, and
>> the new checks should not trigger any hard failures.
>
> I have tested and get an error as expected when I munge the backup_label
> file:
>
> FATAL:  invalid data in file "backup_label"
> DETAIL:  Timeline ID parsed is 2, but expected 1
>
> Everything else looks good so I will mark it ready for committer.

Thanks. This maps what I saw.
-- 
Michael


Re: [HACKERS] Timeline ID in backup_label file

От
David Steele
Дата:
On 11/27/17 7:11 PM, Michael Paquier wrote:
> On Mon, Nov 27, 2017 at 11:06 PM, David Steele <david@pgmasters.net> wrote:
>> On 11/15/17 10:09 PM, Michael Paquier wrote:
>>>
>>> read_backup_label() is a static function in the backend code. With #2
>>> I do not imply to change the order of the elements written in the
>>> backup_label file, just to make the way they are parsed smarter.
>>
>> My point is that the order *could* be changed and it wouldn't be noticed
>> if the read function were improved as you propose.
>>
>> I'm not against the idea, just think we should continue to enforce the
>> order unless we decide an interface break is OK.
> 
> I still don't quite understand here. The order the items are read does
> not cause a backward-incompatible change. True that there is no reason
> to change that either now.

Perhaps I'm the one who is misunderstanding. If you propose a patch I'll 
be happy to review it, though I doubt there is a lot to be gained even 
if it would be a better implementation.

Regards,
-- 
-David
david@pgmasters.net


Re: [HACKERS] Timeline ID in backup_label file

От
Michael Paquier
Дата:
On Tue, Nov 28, 2017 at 9:40 AM, David Steele <david@pgmasters.net> wrote:
> Perhaps I'm the one who is misunderstanding. If you propose a patch I'll be
> happy to review it, though I doubt there is a lot to be gained even if it
> would be a better implementation.

OK. I'll keep that in mind. Thanks for the input.
-- 
Michael


Re: [HACKERS] Timeline ID in backup_label file

От
Simon Riggs
Дата:
On 27 November 2017 at 14:06, David Steele <david@pgmasters.net> wrote:

> I have tested and get an error as expected when I munge the backup_label
> file:
>
> FATAL:  invalid data in file "backup_label"
> DETAIL:  Timeline ID parsed is 2, but expected 1
>
> Everything else looks good so I will mark it ready for committer.

Sounds good.

No tests? No docs/extended explanatory comments?

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Timeline ID in backup_label file

От
Michael Paquier
Дата:
On Fri, Jan 5, 2018 at 11:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 27 November 2017 at 14:06, David Steele <david@pgmasters.net> wrote:
>
>> I have tested and get an error as expected when I munge the backup_label
>> file:
>>
>> FATAL:  invalid data in file "backup_label"
>> DETAIL:  Timeline ID parsed is 2, but expected 1
>>
>> Everything else looks good so I will mark it ready for committer.
>
> Sounds good.

Thanks for the feedback.

> No tests?

Do you think that's worth the cycles as a TAP test? This can be done
by generating a dummy backup_label file and then start a standby to
see the failure, but this looks quite costly for minimum gain. This
code path is also taken tested in
src/test/recovery/t/001_stream_rep.pl for example, though that's not
the failure path. So if we add a DEBUG1 line after fetching correctly
the timeline ID this could help by looking at
tmp_check/log/001_stream_rep_standby_1.log, but this needs to set
"log_min_messages = debug1" PostgresNode.pm for example so as this
shows up in the logs (this could be useful if done by default
actually, DEBUG1 is not too talkative). Attached is an example of
patch doing so. See for the addition in PostgresNode.pm and this new
bit:
+       ereport(DEBUG1,
+               (errmsg("backup timeline %u in file \"%s\"",
+                       tli_from_file, BACKUP_LABEL_FILE)));

> No docs/extended explanatory comments?

There is no existing documentation about the format of the backup_file
in doc/. So it seems to me that it is not the problem of this patch to
fill in the hole. Regarding the comments, perhaps something better
could be done, but I am not quite sure what. We don't much explain
what the current fields mean, except that one can guess what they mean
from their names, which is the intention behind the code.
-- 
Michael

Вложения

Re: [HACKERS] Timeline ID in backup_label file

От
Michael Paquier
Дата:
On Sat, Jan 6, 2018 at 9:54 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Jan 5, 2018 at 11:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 27 November 2017 at 14:06, David Steele <david@pgmasters.net> wrote:
>>
>>> I have tested and get an error as expected when I munge the backup_label
>>> file:
>>>
>>> FATAL:  invalid data in file "backup_label"
>>> DETAIL:  Timeline ID parsed is 2, but expected 1
>>>
>>> Everything else looks good so I will mark it ready for committer.
>>
>> Sounds good.
>
> Thanks for the feedback.

Previous patch was incorrect, this was the same v2 as upthread.
Attached is the correct v3.
-- 
Michael

Вложения

Re: [HACKERS] Timeline ID in backup_label file

От
Michael Paquier
Дата:
On Sat, Jan 06, 2018 at 09:56:03AM +0900, Michael Paquier wrote:
> Previous patch was incorrect, this was the same v2 as upthread.
> Attached is the correct v3.

For the archive's sake, this version of the patch has been pushed as
6271fce. Thanks Simon for the commit, and David for the review!
--
Michael

Вложения