Обсуждение: Add timeline to partial WAL segments

Поиск
Список
Период
Сортировка

Add timeline to partial WAL segments

От
David Steele
Дата:
Hackers,

The .partial mechanism was added in de768844 to help avoid conflicts
between a newly promoted primary and an old primary that might produce
the same WAL segment.  This works for a single promotion but can become
problematic in HA configurations where there may be several promotions
before a stable primary emerges.

Consider the following scenario:

1) A is the primary
2) B follows A as a standby
3) A is shutdown immediate
4) B is promoted and selects timeline 2
5) B archives 000000010000000100000001.partial
6) B archives 00000002.history
7) B goes away before archiving 000000020000000100000001
8) A is put into recovery
9) A is promoted and selects timeline 3
10) A can't archive 000000010000000100000001.partial because it already
exists

We recommend that archive commands not overwrite an existing segment.
Some backup tools will compare the contents and succeed if they are
equal, but in this case that will still often fail because recycled WAL
segments will have different bytes at the end on the primary and
standby.  The files may not even be logically the same because B may not
have received all WAL from A.

After some discussion with the Patroni folks, Stephen and I came up with
the idea of adding the timeline that the cluster is *promoting to* into
the .partial name to avoid these sorts of conflicts.

However, there is still a race condition here.  Since the
000000010000000100000001.partial is archived first the 00000002.history
file might not make it to the archive before B crashes.  In that case A
will pick timeline 2 and still be stuck.  However, I'm thinking it would
be easy to teach pgarch_readyXlog() to return any .history files it
finds first (in order, of course).

Another option would be to immediately archive the first WAL segment on
timeline 2 and forgo the .partial file entirely.  In this case the
archiver will archive the 00000002.history file before
000000020000000100000001 and we avoid the race condition above.  That
also means we could recover A and promote without a conflict on the
.partial.  Or we could recover A along timeline 2.

Or we could do some combination of the above.

I have attached a patch that adds the timeline to the .partial file.
This passes check-world.

I think we should consider back-patching some set of these changes since
this causes real pain in current production HA configurations.

Thoughts?

-- 
-David
david@pgmasters.net

Вложения

Re: Add timeline to partial WAL segments

От
Michael Paquier
Дата:
On Mon, Dec 10, 2018 at 10:21:23AM -0500, David Steele wrote:
> We recommend that archive commands not overwrite an existing segment.
> Some backup tools will compare the contents and succeed if they are
> equal, but in this case that will still often fail because recycled WAL
> segments will have different bytes at the end on the primary and
> standby.  The files may not even be logically the same because B may not
> have received all WAL from A.

This is not a new problem, the last, partial segment generated
post-promotion of a timeline needs to be archived.  Since the
introduction of .partial within the segment name in 9.5, we also assume
that the OP would be smart enough to rename the segment to replay up to
the end of the past timeline if need be for PITR.

> However, there is still a race condition here.  Since the
> 000000010000000100000001.partial is archived first the 00000002.history
> file might not make it to the archive before B crashes.  In that case A
> will pick timeline 2 and still be stuck.  However, I'm thinking it would
> be easy to teach pgarch_readyXlog() to return any .history files it
> finds first (in order, of course).

Still the .ready file of the partial segment would be generated before
the history file, right?  In what does that help?

> Another option would be to immediately archive the first WAL segment on
> timeline 2 and forgo the .partial file entirely.  In this case the
> archiver will archive the 00000002.history file before
> 000000020000000100000001 and we avoid the race condition above.  That
> also means we could recover A and promote without a conflict on the
> .partial.  Or we could recover A along timeline 2.

This breaks the definition of IsPartialXLogFileName() in
xlog_internal.h, and the current naming convention of using only dots as
field separators.  Another more tricky problem is that this is
inconsistent with the way pg_receivewal.c behaves for non-completed
segments, which is a reason behind using .partial for the last partial
segment on the backend side as well.  So this proposal makes things more
inconsistent.

> I have attached a patch that adds the timeline to the .partial file.
> This passes check-world.
>
> I think we should consider back-patching some set of these changes since
> this causes real pain in current production HA configurations.
>
> Thoughts?

So you basically append the new timeline ID to the segment name which
still uses the old timeline ID in the first 8 characters of its name.
Logically I find this proposal weird as the segment refers contents
which are part of the past, and the backend is not going to use the
contents of this segment when jumping to the a new timeline, but the
contents of the segment which has the same contents up to the point WAL
forked, with the name of the new timeline.

It seems to me that this is quite a change for a low-probability
problem, as this assumes that the promotion of two different servers
happen on exactly the same segment and that both would finish by
archiving the same last partial segment.

Putting aside this proposal, it would be actually nice to put in
xlog_internal.h a macro which is able to write a partial file name,
close to the same place where we check if the segment name refers to a
partial segment.

My 2c.
--
Michael

Вложения

Re: Add timeline to partial WAL segments

От
David Steele
Дата:
Hi Michael,

On 12/10/18 8:43 PM, Michael Paquier wrote:
> On Mon, Dec 10, 2018 at 10:21:23AM -0500, David Steele wrote:
>> We recommend that archive commands not overwrite an existing segment.
>> Some backup tools will compare the contents and succeed if they are
>> equal, but in this case that will still often fail because recycled WAL
>> segments will have different bytes at the end on the primary and
>> standby.  The files may not even be logically the same because B may not
>> have received all WAL from A.
>
> This is not a new problem, the last, partial segment generated
> post-promotion of a timeline needs to be archived.  Since the
> introduction of .partial within the segment name in 9.5, we also assume
> that the OP would be smart enough to rename the segment to replay up to
> the end of the past timeline if need be for PITR.

It's not a new problem, but I do think it was only partially solved.
There are easy-to-reproduce cases where more than one .partial is
generated and I think we should handle that gracefully.

>> However, there is still a race condition here.  Since the
>> 000000010000000100000001.partial is archived first the 00000002.history
>> file might not make it to the archive before B crashes.  In that case A
>> will pick timeline 2 and still be stuck.  However, I'm thinking it would
>> be easy to teach pgarch_readyXlog() to return any .history files it
>> finds first (in order, of course).
>
> Still the .ready file of the partial segment would be generated before
> the history file, right?  In what does that help?

It looks to me like the history file is written first, and then the
.partial.  But because we process WAL alphabetically the .partial ends
up being pushed first.

The idea is to stake a claim to the timeline as quickly as possible so
nobody else claims it.  The pgarch_readyXlog() reads through all the
files, so it would be easy to return the history files first, then the
.partial, then the new timeline files, regardless of what order the
.ready files were written in.

The history file is also small, so it will be faster to copy and less
subject to latency concerns.  We want to reduce the window in which
another potential primary can end up with a duplicate timeline.

>> Another option would be to immediately archive the first WAL segment on
>> timeline 2 and forgo the .partial file entirely.  In this case the
>> archiver will archive the 00000002.history file before
>> 000000020000000100000001 and we avoid the race condition above.  That
>> also means we could recover A and promote without a conflict on the
>> .partial.  Or we could recover A along timeline 2.
>
> This breaks the definition of IsPartialXLogFileName() in
> xlog_internal.h, and the current naming convention of using only dots as
> field separators.

Good point.  Interesting that missing this didn't break any tests.

> Another more tricky problem is that this is
> inconsistent with the way pg_receivewal.c behaves for non-completed
> segments, which is a reason behind using .partial for the last partial
> segment on the backend side as well.  So this proposal makes things more
> inconsistent.

I think that we might need to apply the same logic to pg_receivewal.

>> I have attached a patch that adds the timeline to the .partial file.
>> This passes check-world.
>>
>> I think we should consider back-patching some set of these changes since
>> this causes real pain in current production HA configurations.
>>
>> Thoughts?
>
> So you basically append the new timeline ID to the segment name which
> still uses the old timeline ID in the first 8 characters of its name.
> Logically I find this proposal weird as the segment refers contents
> which are part of the past, and the backend is not going to use the
> contents of this segment when jumping to the a new timeline, but the
> contents of the segment which has the same contents up to the point WAL
> forked, with the name of the new timeline.

.partial files are rarely used in general, but we decided not to throw
them away because they *might* contain valuable information.  However,
in their current form they are more of a nuisance than a help.

What we're really looking for here is a way to sensibly version .partial
files in a way so that they a) don't conflict in the repo and b) their
name indicates where they came from.

If anyone can think of a naming scheme that makes more sense, I'm all ears.

> It seems to me that this is quite a change for a low-probability
> problem, as this assumes that the promotion of two different servers
> happen on exactly the same segment and that both would finish by
> archiving the same last partial segment.

It's actually a common (i.e. daily) problem at scale, especially when
archiving to high-latency storage like S3.  Patroni is especially likely
to show the issue as it favors uptime over preserving data in its
default configuration.

Thanks,
--
-David
david@pgmasters.net


Вложения

Re: Add timeline to partial WAL segments

От
Michael Paquier
Дата:
On Tue, Dec 11, 2018 at 09:15:16AM -0500, David Steele wrote:
> It looks to me like the history file is written first, and then the
> .partial.  But because we process WAL alphabetically the .partial ends
> up being pushed first.

Yes, that's right.

> The idea is to stake a claim to the timeline as quickly as possible so
> nobody else claims it.  The pgarch_readyXlog() reads through all the
> files, so it would be easy to return the history files first, then the
> .partial, then the new timeline files, regardless of what order the
> .ready files were written in.
>
> The history file is also small, so it will be faster to copy and less
> subject to latency concerns.  We want to reduce the window in which
> another potential primary can end up with a duplicate timeline.

Your arguments here are sensible, so +1 for changing pgarch_readyXlog so
as it gives priority to history files for archiving.  It seems to me
that we should also be careful of ordering partial files and normal
segments, and perhaps consider backup history files as having the lower
level of priority.

>>> Another option would be to immediately archive the first WAL segment on
>>> timeline 2 and forgo the .partial file entirely.  In this case the
>>> archiver will archive the 00000002.history file before
>>> 000000020000000100000001 and we avoid the race condition above.  That
>>> also means we could recover A and promote without a conflict on the
>>> .partial.  Or we could recover A along timeline 2.
>>
>> This breaks the definition of IsPartialXLogFileName() in
>> xlog_internal.h, and the current naming convention of using only dots as
>> field separators.
>
> Good point.  Interesting that missing this didn't break any tests.

This means also that we may need more tests for that.  For pg_receivewal
this could just be a check at the end of the last test for example.

>> Another more tricky problem is that this is
>> inconsistent with the way pg_receivewal.c behaves for non-completed
>> segments, which is a reason behind using .partial for the last partial
>> segment on the backend side as well.  So this proposal makes things more
>> inconsistent.
>
> I think that we might need to apply the same logic to pg_receivewal.

I really don't think that it is a good idea to link a future timeline
within a segment which includes in its name a direct reference to its
current timeline.  Also  I don't buy much the argument that those
segments are a nuisance as well all the time.  They may be for some
tools, however not for others depending on the archiving strategy
(distributed locations for example), and if they are a problem for your
deployments, why not just discarding them within the archive command and
be done with them?
--
Michael

Вложения

Re: Add timeline to partial WAL segments

От
Robert Haas
Дата:
On Wed, Dec 12, 2018 at 11:29 AM Michael Paquier <michael@paquier.xyz> wrote:
> I really don't think that it is a good idea to link a future timeline
> within a segment which includes in its name a direct reference to its
> current timeline.  Also  I don't buy much the argument that those
> segments are a nuisance as well all the time.  They may be for some
> tools, however not for others depending on the archiving strategy
> (distributed locations for example), and if they are a problem for your
> deployments, why not just discarding them within the archive command and
> be done with them?

-1.  Writing an archive_command already requires a PhD in
PostgreSQL-ology.  The very last thing we should do is invent even
more ways for an archive command to be subtly wrong.

I have a feeling the already-known ways to do it wrong are not too
well documented, which is another problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 12/11/18 11:06 PM, Robert Haas wrote:
> On Wed, Dec 12, 2018 at 11:29 AM Michael Paquier <michael@paquier.xyz> wrote:
>> I really don't think that it is a good idea to link a future timeline
>> within a segment which includes in its name a direct reference to its
>> current timeline.  Also  I don't buy much the argument that those
>> segments are a nuisance as well all the time.  They may be for some
>> tools, however not for others depending on the archiving strategy
>> (distributed locations for example), and if they are a problem for your
>> deployments, why not just discarding them within the archive command and
>> be done with them?
> 
> -1.  Writing an archive_command already requires a PhD in
> PostgreSQL-ology.  The very last thing we should do is invent even
> more ways for an archive command to be subtly wrong.

The point here is to make archive commands simpler.  As it is, the
various backup tools are going to need to find ways to deal with the
problem, and each solution will be different.

The goal is to come up with a solution that works and that all archive
commands can use, rather than each one rolling their own solution.

-- 
-David
david@pgmasters.net


Re: Add timeline to partial WAL segments

От
Robert Haas
Дата:
On Wed, Dec 12, 2018 at 1:13 PM David Steele <david@pgmasters.net> wrote:
> On 12/11/18 11:06 PM, Robert Haas wrote:
> > On Wed, Dec 12, 2018 at 11:29 AM Michael Paquier <michael@paquier.xyz> wrote:
> >> I really don't think that it is a good idea to link a future timeline
> >> within a segment which includes in its name a direct reference to its
> >> current timeline.  Also  I don't buy much the argument that those
> >> segments are a nuisance as well all the time.  They may be for some
> >> tools, however not for others depending on the archiving strategy
> >> (distributed locations for example), and if they are a problem for your
> >> deployments, why not just discarding them within the archive command and
> >> be done with them?
> >
> > -1.  Writing an archive_command already requires a PhD in
> > PostgreSQL-ology.  The very last thing we should do is invent even
> > more ways for an archive command to be subtly wrong.
>
> The point here is to make archive commands simpler.  As it is, the
> various backup tools are going to need to find ways to deal with the
> problem, and each solution will be different.
>
> The goal is to come up with a solution that works and that all archive
> commands can use, rather than each one rolling their own solution.

I understand.  I'm more or less agreeing with you.  Actually, I'm not
really sure whether your particular proposal is the best way of
handling this, but I disagree with Michael's suggestion that we should
just throw responsibility back on archive_command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 12/11/18 11:14 PM, Robert Haas wrote:
> On Wed, Dec 12, 2018 at 1:13 PM David Steele <david@pgmasters.net> wrote:
>> On 12/11/18 11:06 PM, Robert Haas wrote:
>>> On Wed, Dec 12, 2018 at 11:29 AM Michael Paquier <michael@paquier.xyz> wrote:
>>>> I really don't think that it is a good idea to link a future timeline
>>>> within a segment which includes in its name a direct reference to its
>>>> current timeline.  Also  I don't buy much the argument that those
>>>> segments are a nuisance as well all the time.  They may be for some
>>>> tools, however not for others depending on the archiving strategy
>>>> (distributed locations for example), and if they are a problem for your
>>>> deployments, why not just discarding them within the archive command and
>>>> be done with them?
>>>
>>> -1.  Writing an archive_command already requires a PhD in
>>> PostgreSQL-ology.  The very last thing we should do is invent even
>>> more ways for an archive command to be subtly wrong.
>>
>> The point here is to make archive commands simpler.  As it is, the
>> various backup tools are going to need to find ways to deal with the
>> problem, and each solution will be different.
>>
>> The goal is to come up with a solution that works and that all archive
>> commands can use, rather than each one rolling their own solution.
> 
> I understand.  I'm more or less agreeing with you.  Actually, I'm not
> really sure whether your particular proposal is the best way of
> handling this, but I disagree with Michael's suggestion that we should
> just throw responsibility back on archive_command.

I apologize, I misread that.  Your comments make a lot more sense now
that I read them in the correct context!

And yeah, I'm not sure that I'm totally sold on this idea either, but I
wanted to get a conversation going.

-- 
-David
david@pgmasters.net


Re: Add timeline to partial WAL segments

От
Michael Paquier
Дата:
On Tue, Dec 11, 2018 at 11:27:30PM -0500, David Steele wrote:
> And yeah, I'm not sure that I'm totally sold on this idea either, but I
> wanted to get a conversation going.

Another idea which I think we could live with is to embed within the
segment file name the LSN switch point, in a format consistent with
backup history files.  And as a bonus it could be used for sanity
checks.

(I am completely sold to the idea of prioritizing file types in the
archiver.)
--
Michael

Вложения

Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 12/12/18 12:14 AM, Michael Paquier wrote:
> On Tue, Dec 11, 2018 at 11:27:30PM -0500, David Steele wrote:
>> And yeah, I'm not sure that I'm totally sold on this idea either, but I
>> wanted to get a conversation going.
> 
> Another idea which I think we could live with is to embed within the
> segment file name the LSN switch point, in a format consistent with
> backup history files.  And as a bonus it could be used for sanity
> checks.

The LSN switch point is often the same even when servers are going to
different timelines.  If the LSN is different enough then the problem
solves itself since the .partial will be on an entirely different segment.

But, we could at least use the . notation and end up with something like
000000010000000100000001.00000002.partial or perhaps
000000010000000100000001.T00000002.partial?  Maybe
000000010000000100000001.00000002.tpartial?

I can't decide whether the .partial files generated by pg_receivewal are
a problem or not.  It seems to me that only the original cluster is
going to be able to stream this file -- once the segment is renamed to
.partial it won't be visible to pg_receivexlog.  So, .partial without a
timeline extension just means partial from the original timeline.

There's another difference.  The .partial generated by pg_receivewal is
an actively-worked-on file whereas the .partial generated by a promotion
is probably headed for oblivion.  I haven't see a single case where one
was used in an actual recovery (which doesn't mean it hasn't happened,
of course).

> (I am completely sold to the idea of prioritizing file types in the
> archiver.)

OK, I'll work up a patch for that, then.  It doesn't solve the .partial
problem, but it does reduce the likelihood that two servers will end up
on the same timeline.

-- 
-David
david@pgmasters.net


Re: Add timeline to partial WAL segments

От
Michael Paquier
Дата:
On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
> The LSN switch point is often the same even when servers are going to
> different timelines.  If the LSN is different enough then the problem
> solves itself since the .partial will be on an entirely different
> segment.

That would mean that WAL forked exactly at the same record.  You have
likely seen more cases where than can happen in real life than I do.

> But, we could at least use the . notation and end up with something like
> 000000010000000100000001.00000002.partial or perhaps
> 000000010000000100000001.T00000002.partial?  Maybe
> 000000010000000100000001.00000002.tpartial?
>
> I can't decide whether the .partial files generated by pg_receivewal are
> a problem or not.  It seems to me that only the original cluster is
> going to be able to stream this file -- once the segment is renamed to
> .partial it won't be visible to pg_receivexlog.  So, .partial without a
> timeline extension just means partial from the original timeline.

Still this does not actually solve the problem when two servers are
trying to use the same timeline?  You would get conflicts with the
history file as well in this case but the partial segment gets archived
first..  It seems to me that it is kind of difficult to come with a
totally bullet-proof solution.  Adding the timeline is appealing to use
if the history files can be added on time, the switchpoint LSN is also
appealing as per the likelihood of WAL forking at a different point on a
record basis.  Perhaps another solution could be to add both, but that
looks like an overkill.

> There's another difference.  The .partial generated by pg_receivewal is
> an actively-worked-on file whereas the .partial generated by a promotion
> is probably headed for oblivion.  I haven't see a single case where one
> was used in an actual recovery (which doesn't mean it hasn't happened,
> of course).

There are many people implementing their own backup solutions, it is
hard to say that none of those solutions are actually able to copy the
last partial file to replay up to the end of a wanted timeline for a
PITR.

>> (I am completely sold to the idea of prioritizing file types in the
>> archiver.)
>
> OK, I'll work up a patch for that, then.  It doesn't solve the .partial
> problem, but it does reduce the likelihood that two servers will end up
> on the same timeline.

Thanks a lot, David!
--
Michael

Вложения

Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 12/12/18 7:17 PM, Michael Paquier wrote:
> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:

>> But, we could at least use the . notation and end up with something like
>> 000000010000000100000001.00000002.partial or perhaps
>> 000000010000000100000001.T00000002.partial?  Maybe
>> 000000010000000100000001.00000002.tpartial?
>>
>> I can't decide whether the .partial files generated by pg_receivewal are
>> a problem or not.  It seems to me that only the original cluster is
>> going to be able to stream this file -- once the segment is renamed to
>> .partial it won't be visible to pg_receivexlog.  So, .partial without a
>> timeline extension just means partial from the original timeline.
>
> Still this does not actually solve the problem when two servers are
> trying to use the same timeline?  You would get conflicts with the
> history file as well in this case but the partial segment gets archived
> first..  It seems to me that it is kind of difficult to come with a
> totally bullet-proof solution.

What we would like to do here is reduce the window in which the server
will make a bad choice by pushing timeline files first, then allow the
server to archive by using a more unique name for partial WAL.

It is understood that servers will need to be rebuilt in HA
environments, but we would like to reduce the frequency.  More
importantly we want to preserve the integrity of the WAL archive since
it serves as the ground truth for all the clusters.  Reducing duplicates
spares users decisions about what to delete to get their cluster
archiving again.

>> There's another difference.  The .partial generated by pg_receivewal is
>> an actively-worked-on file whereas the .partial generated by a promotion
>> is probably headed for oblivion.  I haven't see a single case where one
>> was used in an actual recovery (which doesn't mean it hasn't happened,
>> of course).
>
> There are many people implementing their own backup solutions, it is
> hard to say that none of those solutions are actually able to copy the
> last partial file to replay up to the end of a wanted timeline for a
> PITR.

Indeed, we've thought about adding it to pgBackRest.  But we haven't,
and to my knowledge nobody else has either.  It will come though, and a
better way of naming these files will help make them useful.

Regards,
--
-David
david@pgmasters.net


Вложения

Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 12/13/18 8:35 AM, David Steele wrote:
> On 12/12/18 7:17 PM, Michael Paquier wrote:
>> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
>
>>> But, we could at least use the . notation and end up with something like
>>> 000000010000000100000001.00000002.partial or perhaps
>>> 000000010000000100000001.T00000002.partial?  Maybe
>>> 000000010000000100000001.00000002.tpartial?
>>>

I updated the patch to use 000000010000000100000001.00000002.partial
since this is more consistent with what we do in other cases.

--
-David
david@pgmasters.net

Вложения

Re: Add timeline to partial WAL segments

От
Robert Haas
Дата:
On Thu, Dec 13, 2018 at 12:17 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
> > The LSN switch point is often the same even when servers are going to
> > different timelines.  If the LSN is different enough then the problem
> > solves itself since the .partial will be on an entirely different
> > segment.
>
> That would mean that WAL forked exactly at the same record.  You have
> likely seen more cases where than can happen in real life than I do.

Suppose that the original master fails during an idle period, and we
promote a slave.  But we accidentally promote a slave that can't serve
as the new master, like because it's in a datacenter with an
unreliable network connection or one which is about to be engulfed in
lava.  So, we go to promote a different slave, and because we never
got around to reconfiguring the standbys to follow the previous
promotion, kaboom.

Or, suppose we do PITR to recover from some user error, but then
somebody screws up the contents of the recovered cluster and we have
to do it over again. Naturally we'll recover to the same point.

The new TLI is the only thing that is guaranteed to be unique with
each new promotion, and I would guess that it is therefore the right
thing to use to disambiguate them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 12/14/18 3:26 PM, Robert Haas wrote:
> On Thu, Dec 13, 2018 at 12:17 AM Michael Paquier <michael@paquier.xyz> wrote:
>> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
>>> The LSN switch point is often the same even when servers are going to
>>> different timelines.  If the LSN is different enough then the problem
>>> solves itself since the .partial will be on an entirely different
>>> segment.
>>
>> That would mean that WAL forked exactly at the same record.  You have
>> likely seen more cases where than can happen in real life than I do.
> 
> Suppose that the original master fails during an idle period, and we
> promote a slave.  But we accidentally promote a slave that can't serve
> as the new master, like because it's in a datacenter with an
> unreliable network connection or one which is about to be engulfed in
> lava.  

Much more common than people think.

> So, we go to promote a different slave, and because we never
> got around to reconfiguring the standbys to follow the previous
> promotion, kaboom.

Exactly.

> Or, suppose we do PITR to recover from some user error, but then
> somebody screws up the contents of the recovered cluster and we have
> to do it over again. Naturally we'll recover to the same point.
> 
> The new TLI is the only thing that is guaranteed to be unique with
> each new promotion, and I would guess that it is therefore the right
> thing to use to disambiguate them.

This is the conclusion we came to after a few months of diagnosing and
working on this problem.

The question in my mind: is it safe to back-patch?

-- 
-David
david@pgmasters.net


Re: Add timeline to partial WAL segments

От
Michael Paquier
Дата:
On Fri, Dec 14, 2018 at 06:05:18PM -0500, David Steele wrote:
> On 12/14/18 3:26 PM, Robert Haas wrote:
>> The new TLI is the only thing that is guaranteed to be unique with
>> each new promotion, and I would guess that it is therefore the right
>> thing to use to disambiguate them.
>
> This is the conclusion we came to after a few months of diagnosing and
> working on this problem.

Hm.  Okay.  From that point of view I think that we should also make
pg_receivewal able to generate the same kind of segment format when it
detects a timeline switch for the last partial segment of the previous
timeline.  Switching to a new timeline makes the streaming finish, so we
could use that to close properly the WAL segment with the new name.  At
the same time it would be nice to have a macro able to generate a
partial segment name and also check after it.

> The question in my mind: is it safe to back-patch?

That's unfortunately a visibility change, meaning that any existing
backup solution able to handle .partial segments would get broken in a
minor release.  From that point of view this should go only on HEAD.
The other patch to reorder the priority of the .ready files could go
down without any problem though.
--
Michael

Вложения

Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 12/15/18 1:56 AM, Michael Paquier wrote:
> On Fri, Dec 14, 2018 at 06:05:18PM -0500, David Steele wrote:
>> On 12/14/18 3:26 PM, Robert Haas wrote:
>>> The new TLI is the only thing that is guaranteed to be unique with
>>> each new promotion, and I would guess that it is therefore the right
>>> thing to use to disambiguate them.
>>
>> This is the conclusion we came to after a few months of diagnosing and
>> working on this problem.
> 
> Hm.  Okay.  From that point of view I think that we should also make
> pg_receivewal able to generate the same kind of segment format when it
> detects a timeline switch for the last partial segment of the previous
> timeline.  Switching to a new timeline makes the streaming finish, so we
> could use that to close properly the WAL segment with the new name.  At
> the same time it would be nice to have a macro able to generate a
> partial segment name and also check after it.

Or perhaps just always add the timeline to the .partial?  That way it 
doesn't need to be renamed later.  Also, there would be a consistent 
name, rather than sometimes .partial, sometimes .<timelime>.partial.

>> The question in my mind: is it safe to back-patch?
> 
> That's unfortunately a visibility change, meaning that any existing
> backup solution able to handle .partial segments would get broken in a
> minor release.  From that point of view this should go only on HEAD.

That means every archive command needs to deal with this issue on its 
own.  It's definitely not a trivial thing to do.

It's obviously strong to call this a bug, but there's very clearly an 
issue here.  I wonder if there is anything else we can do that would work?

> The other patch to reorder the priority of the .ready files could go
> down without any problem though.

Glad to hear it.

-- 
-David
david@pgmasters.net


Re: Add timeline to partial WAL segments

От
Robert Haas
Дата:
On Fri, Dec 14, 2018 at 6:05 PM David Steele <david@pgmasters.net> wrote:
> The question in my mind: is it safe to back-patch?

I cannot imagine it being a good idea to stick a behavioral change
like this into a minor release.  Yeah, it lets people get out from
under this problem a lot sooner, but it potentially breaks backup
scripts even for people who were not suffering in the first place.  I
don't think solving this problem for the people who have it is worth
inflicting that kind of breakage on everybody.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Add timeline to partial WAL segments

От
Michael Paquier
Дата:
On Thu, Dec 20, 2018 at 02:13:01PM +0200, David Steele wrote:
> Or perhaps just always add the timeline to the .partial?  That way it
> doesn't need to be renamed later.  Also, there would be a consistent name,
> rather than sometimes .partial, sometimes .<timelime>.partial.

Hm.  A renaming still needs to happen afterwards anyway, no?  When a
segment is created with a given name pg_receivewal cannot know if the
segment it is working on will be the last partial segment of a given
timeline.  And what would be changed in the segment name is the addition
of the new TLI, not the previous one.

> That means every archive command needs to deal with this issue on its own.
> It's definitely not a trivial thing to do.
>
> It's obviously strong to call this a bug, but there's very clearly an issue
> here.  I wonder if there is anything else we can do that would work?

Well, that's a visibility change, and I can see that Robert and I are on
the same page here.  It is annoying for people to do a minor release
with a RPM replacement and then potentially have to change things in
urgency.  Nothing good comes out of such situations.
--
Michael

Вложения

Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 12/20/18 10:56 PM, Robert Haas wrote:
> On Fri, Dec 14, 2018 at 6:05 PM David Steele <david@pgmasters.net> wrote:
>> The question in my mind: is it safe to back-patch?
> 
> I cannot imagine it being a good idea to stick a behavioral change
> like this into a minor release.  Yeah, it lets people get out from
> under this problem a lot sooner, but it potentially breaks backup
> scripts even for people who were not suffering in the first place.  I
> don't think solving this problem for the people who have it is worth
> inflicting that kind of breakage on everybody.

Fair enough -- figured I would make an argument and see what people thought.

It's easy enough to solve on my end, but I'll wait and see what naming 
scheme we come up with.

-- 
-David
david@pgmasters.net


Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 12/21/18 2:10 AM, Michael Paquier wrote:
> On Thu, Dec 20, 2018 at 02:13:01PM +0200, David Steele wrote:
>> Or perhaps just always add the timeline to the .partial?  That way it
>> doesn't need to be renamed later.  Also, there would be a consistent name,
>> rather than sometimes .partial, sometimes .<timelime>.partial.
> 
> Hm.  A renaming still needs to happen afterwards anyway, no?  When a
> segment is created with a given name pg_receivewal cannot know if the
> segment it is working on will be the last partial segment of a given
> timeline.  And what would be changed in the segment name is the addition
> of the new TLI, not the previous one.

I was thinking the file would only be renamed on successful completion. 
Suppose we are on timeline 1 pg_receivewal would be writing to:

000000010000000100000001.00000001.partial

If the WAL segment is never completed (server crashes, etc.) it would 
keep that name.

If a cluster is promoted it would archive:

000000010000000100000001.00000002.partial

And then pg_receivewal would start writing on:

000000020000000100000001.00000002.partial

When that segment successfully completes it would be renamed by 
pg_receivewal to 000000020000000100000001.

In short, the *initial* name of the WAL file is set to what it should be 
if it doesn't complete so we don't need to run around and try to rename 
files on failure.  Only on success do we need to rename.

Sound plausible?

-- 
-David
david@pgmasters.net


Re: Add timeline to partial WAL segments

От
Michael Paquier
Дата:
On Mon, Dec 31, 2018 at 01:07:34PM +0200, David Steele wrote:
> In short, the *initial* name of the WAL file is set to what it should be if
> it doesn't complete so we don't need to run around and try to rename files
> on failure.  Only on success do we need to rename.
>
> Sound plausible?

It does.
--
Michael

Вложения

Re: Add timeline to partial WAL segments

От
David Steele
Дата:
On 1/5/19 3:31 AM, Michael Paquier wrote:
> On Mon, Dec 31, 2018 at 01:07:34PM +0200, David Steele wrote:
>> In short, the *initial* name of the WAL file is set to what it should be if
>> it doesn't complete so we don't need to run around and try to rename files
>> on failure.  Only on success do we need to rename.
>>
>> Sound plausible?
> 
> It does.

Excellent.  This now requires work on my part so I have marked the patch 
"Waiting on Author".

Thanks,
-- 
-David
david@pgmasters.net


Re: Add timeline to partial WAL segments

От
Michael Paquier
Дата:
On Sat, Jan 05, 2019 at 08:32:55AM +0200, David Steele wrote:
> Excellent.  This now requires work on my part so I have marked the patch
> "Waiting on Author".

As nothing has happened here, I have marked the patch as RwF.
--
Michael

Вложения