Обсуждение: backup tools ought to ensure created backups are durable

Поиск
Список
Период
Сортировка

backup tools ought to ensure created backups are durable

От
Andres Freund
Дата:
Hi,

As pointed out in
http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
make any efforts to ensure their output is durable.

I think for backup tools of possibly critical data, that's pretty much
unaceptable.

There's cases where we can't ensure durability (i.e. pg_dump | gzip >
file), but it's out of our hands in that case.

Greetings,

Andres Freund



Re: backup tools ought to ensure created backups are durable

От
Michael Paquier
Дата:
On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund <andres@anarazel.de> wrote:
> As pointed out in
> http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
> our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> make any efforts to ensure their output is durable.
>
> I think for backup tools of possibly critical data, that's pretty much
> unaceptable.

Definitely agreed, once a backup/dump has been taken and those
utilities exit, we had better ensure that they are durably on disk.
For pg_basebackup and pg_dump, as everything except pg_dump/plain
require a target directory for the location of the output result, we
really can make things better.
-- 
Michael



Re: backup tools ought to ensure created backups are durable

От
Magnus Hagander
Дата:
On Mon, Mar 28, 2016 at 3:11 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund <andres@anarazel.de> wrote:
> As pointed out in
> http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
> our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> make any efforts to ensure their output is durable.
>
> I think for backup tools of possibly critical data, that's pretty much
> unaceptable.

Definitely agreed, once a backup/dump has been taken and those
utilities exit, we had better ensure that they are durably on disk.
For pg_basebackup and pg_dump, as everything except pg_dump/plain
require a target directory for the location of the output result, we
really can make things better.


Definitely agreed on fixing it. But I don't think your summary is right.

pg_basebackup in tar mode can be sent to stdout, does not require a directory. And the same for pg_dump in any mode except for directory. So we can't just drive it off the mode, some more detailed checks are required.

--

Re: backup tools ought to ensure created backups are durable

От
Andres Freund
Дата:
On 2016-03-28 11:35:57 +0200, Magnus Hagander wrote:
> On Mon, Mar 28, 2016 at 3:11 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
> 
> > On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund <andres@anarazel.de> wrote:
> > > As pointed out in
> > >
> > http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
> > > our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> > > make any efforts to ensure their output is durable.
> > >
> > > I think for backup tools of possibly critical data, that's pretty much
> > > unaceptable.
> >
> > Definitely agreed, once a backup/dump has been taken and those
> > utilities exit, we had better ensure that they are durably on disk.
> > For pg_basebackup and pg_dump, as everything except pg_dump/plain
> > require a target directory for the location of the output result, we
> > really can make things better.
> >
> >
> Definitely agreed on fixing it. But I don't think your summary is right.
> 
> pg_basebackup in tar mode can be sent to stdout, does not require a
> directory. And the same for pg_dump in any mode except for directory. So we
> can't just drive it off the mode, some more detailed checks are required.

if (!isastty(stdout)) ought to do the trick, afaics? And maybe add a
warning somewhere in the docs about the tools not fsyncing if you pipe
their output data somewhere?

Andres



Re: backup tools ought to ensure created backups are durable

От
Magnus Hagander
Дата:


On Mon, Mar 28, 2016 at 3:12 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-03-28 11:35:57 +0200, Magnus Hagander wrote:
> On Mon, Mar 28, 2016 at 3:11 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>
> > On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund <andres@anarazel.de> wrote:
> > > As pointed out in
> > >
> > http://www.postgresql.org/message-id/20160327232509.v5wgac5vskusedin@awork2.anarazel.de
> > > our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> > > make any efforts to ensure their output is durable.
> > >
> > > I think for backup tools of possibly critical data, that's pretty much
> > > unaceptable.
> >
> > Definitely agreed, once a backup/dump has been taken and those
> > utilities exit, we had better ensure that they are durably on disk.
> > For pg_basebackup and pg_dump, as everything except pg_dump/plain
> > require a target directory for the location of the output result, we
> > really can make things better.
> >
> >
> Definitely agreed on fixing it. But I don't think your summary is right.
>
> pg_basebackup in tar mode can be sent to stdout, does not require a
> directory. And the same for pg_dump in any mode except for directory. So we
> can't just drive it off the mode, some more detailed checks are required.

if (!isastty(stdout)) ought to do the trick, afaics? And maybe add a
warning somewhere in the docs about the tools not fsyncing if you pipe
their output data somewhere?

That should work yeah. And given that we already use that check in other places, it seems it should be perfectly safe. And as long as we only do a WARNING and not abort if the fsync fails, we should be OK if people intentionally store their backups on an fs that doesn't speak fsync (if that exists), in which case I don't really think we even need a switch to turn it off. 

--

Re: backup tools ought to ensure created backups are durable

От
Jim Nasby
Дата:
On 3/28/16 11:03 AM, Magnus Hagander wrote:
>
> That should work yeah. And given that we already use that check in other
> places, it seems it should be perfectly safe. And as long as we only do
> a WARNING and not abort if the fsync fails, we should be OK if people
> intentionally store their backups on an fs that doesn't speak fsync (if
> that exists), in which case I don't really think we even need a switch
> to turn it off.

I'd even go so far as spitting out a warning any time we can't fsync 
(maybe that's what you're suggesting?)
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: backup tools ought to ensure created backups are durable

От
Magnus Hagander
Дата:


On Tue, Mar 29, 2016 at 8:46 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 3/28/16 11:03 AM, Magnus Hagander wrote:

That should work yeah. And given that we already use that check in other
places, it seems it should be perfectly safe. And as long as we only do
a WARNING and not abort if the fsync fails, we should be OK if people
intentionally store their backups on an fs that doesn't speak fsync (if
that exists), in which case I don't really think we even need a switch
to turn it off.

I'd even go so far as spitting out a warning any time we can't fsync (maybe that's what you're suggesting?)

That is pretty much what I was suggesting, yes.

Though we might want to consolidate them in for example pg_basebackup -Fp and pg_dump -Fd into something like "failed to fsync <n> files". 


--

Re: backup tools ought to ensure created backups are durable

От
Andres Freund
Дата:
On 2016-03-29 10:06:20 +0200, Magnus Hagander wrote:
> On Tue, Mar 29, 2016 at 8:46 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> 
> > On 3/28/16 11:03 AM, Magnus Hagander wrote:
> >
> >>
> >> That should work yeah. And given that we already use that check in other
> >> places, it seems it should be perfectly safe. And as long as we only do
> >> a WARNING and not abort if the fsync fails, we should be OK if people
> >> intentionally store their backups on an fs that doesn't speak fsync (if
> >> that exists), in which case I don't really think we even need a switch
> >> to turn it off.
> >>
> >
> > I'd even go so far as spitting out a warning any time we can't fsync
> > (maybe that's what you're suggesting?)
> 
> 
> That is pretty much what I was suggesting, yes.
> 
> Though we might want to consolidate them in for example pg_basebackup -Fp
> and pg_dump -Fd into something like "failed to fsync <n> files".

I'd just not output anything if ENOTSUPP or similar is returned, and not
bother with something as complex as collecting errors.



Re: backup tools ought to ensure created backups are durable

От
Magnus Hagander
Дата:


On Tue, Mar 29, 2016 at 10:12 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-03-29 10:06:20 +0200, Magnus Hagander wrote:
> On Tue, Mar 29, 2016 at 8:46 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>
> > On 3/28/16 11:03 AM, Magnus Hagander wrote:
> >
> >>
> >> That should work yeah. And given that we already use that check in other
> >> places, it seems it should be perfectly safe. And as long as we only do
> >> a WARNING and not abort if the fsync fails, we should be OK if people
> >> intentionally store their backups on an fs that doesn't speak fsync (if
> >> that exists), in which case I don't really think we even need a switch
> >> to turn it off.
> >>
> >
> > I'd even go so far as spitting out a warning any time we can't fsync
> > (maybe that's what you're suggesting?)
>
>
> That is pretty much what I was suggesting, yes.
>
> Though we might want to consolidate them in for example pg_basebackup -Fp
> and pg_dump -Fd into something like "failed to fsync <n> files".

I'd just not output anything if ENOTSUPP or similar is returned, and not
bother with something as complex as collecting errors.

That'll work too, I guess. Won't necessarily make people aware of the problem, but in the unlikely event they use a fs like that they should be aware of it already.

--