Обсуждение: Offline enabling/disabling of data checksums

Поиск
Список
Период
Сортировка

Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

the attached patch adds offline enabling/disabling of checksums to
pg_verify_checksums. It is based on independent work both Michael
(Paquier) and me did earlier this year and takes changes from both, see
https://github.com/credativ/pg_checksums and
https://github.com/michaelpq/pg_plugins/tree/master/pg_checksums

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

This is basically meant as a stop-gap measure in case online activation
of checksums won't make it for v12, but maybe it is independently
useful?

Things I have not done so far:

1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
only verify checksums.

2. Rename the scan_* functions (Michael renamed them to operate_file and
operate_directory but I am not sure it is worth it.

3. Once that patch is in, there would be a way to disable checksums so
there'd be a case to also change the initdb default to enabled, but that
required further discussion (and maybe another round of benchmarks).


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:
> It adds an (now mandatory) --action parameter that takes either verify,
> enable or disable as argument.

There are two discussion points which deserve attention here:
1) Do we want to rename pg_verify_checksums to something else, like
pg_checksums.  I like a lot if we would do a simple renaming of the
tool, which should be the first step taken.
2) Which kind of interface do we want to use?  When I did my own
flavor of pg_checksums, I used an --action switch able to use the
following values:
- enable
- disable
- verify
The switch cannot be specified twice (perhaps we could enforce the
last value as other binaries do in the tree, not sure if that's
adapted here).  A second type of interface is to use one switch per
action.  For both interfaces if no action is specify then the tool
fails.  Vote is open.

> This is basically meant as a stop-gap measure in case online activation
> of checksums won't make it for v12, but maybe it is independently
> useful?

I think that this is independently useful, I got this stuff part of an
upgrade workflow where the user is ready to accept some extra one-time
offline time so as checksums are enabled.

> Things I have not done so far:
>
> 1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
> only verify checksums.

Check.  That sounds right to me.

> 2. Rename the scan_* functions (Michael renamed them to operate_file and
> operate_directory but I am not sure it is worth it.

The renaming makes sense, as scan implies only reading while enabling
checksums causes a write.

> 3. Once that patch is in, there would be a way to disable checksums so
> there'd be a case to also change the initdb default to enabled, but that
> required further discussion (and maybe another round of benchmarks).

Perhaps, that's unrelated to this thread though.  I am not sure that
all users would be ready to pay the extra cost of checksums enabled by
default.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

I have added it to the commitfest now:

https://commitfest.postgresql.org/21/1944/

On Sat, Dec 22, 2018 at 08:28:34AM +0900, Michael Paquier wrote:
> On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:
> > It adds an (now mandatory) --action parameter that takes either verify,
> > enable or disable as argument.
> 
> There are two discussion points which deserve attention here:
> 1) Do we want to rename pg_verify_checksums to something else, like
> pg_checksums.  I like a lot if we would do a simple renaming of the
> tool, which should be the first step taken. 

I am for it, but don't mind whether it's before or afterwards, your
call.

> 2) Which kind of interface do we want to use?  When I did my own
> flavor of pg_checksums, I used an --action switch able to use the
> following values:
> - enable
> - disable
> - verify
> The switch cannot be specified twice (perhaps we could enforce the
> last value as other binaries do in the tree, not sure if that's
> adapted here).  A second type of interface is to use one switch per
> action.  For both interfaces if no action is specify then the tool
> fails.  Vote is open.  

Even though my fork has the separate switches, I like the --action one.
On the other hand, it is a bit more typing as you always have to spell
out the action (is there precendent of accepting also incomplete option
arguments like 'v', 'e', 'd'?).

> > This is basically meant as a stop-gap measure in case online activation
> > of checksums won't make it for v12, but maybe it is independently
> > useful?
> 
> I think that this is independently useful, I got this stuff part of an
> upgrade workflow where the user is ready to accept some extra one-time
> offline time so as checksums are enabled.

OK; we have also used that at clients - if the instance has a size of
less than a few dozen GBs, enabling checksums during a routine minor
upgrade restart is not delaying things much.
 
> > 2. Rename the scan_* functions (Michael renamed them to operate_file and
> > operate_directory but I am not sure it is worth it.
> 
> The renaming makes sense, as scan implies only reading while enabling
> checksums causes a write.

Ok, will do in the next version.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Sat, Dec 22, 2018 at 02:42:55PM +0100, Michael Banck wrote:
> On Sat, Dec 22, 2018 at 08:28:34AM +0900, Michael Paquier wrote:
>> There are two discussion points which deserve attention here:
>> 1) Do we want to rename pg_verify_checksums to something else, like
>> pg_checksums.  I like a lot if we would do a simple renaming of the
>> tool, which should be the first step taken.
>
> I am for it, but don't mind whether it's before or afterwards, your
> call.

Doing the renaming after would be a bit weird logically, as we would
finish with a point in time in the tree where pg_verify_checksums is
able to do something else than just verifying checksums.

> Even though my fork has the separate switches, I like the --action one.
> On the other hand, it is a bit more typing as you always have to spell
> out the action (is there precendent of accepting also incomplete option
> arguments like 'v', 'e', 'd'?).

Yes, there is a bit of that in psql for example for formats.  Not sure
that we should take this road for a checksumming tool though.  If a
new option is added which takes the first letter then we would have
incompatibility issues.  That's unlikely to happen, still that feels
uneasy.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Robert Haas
Дата:
On Fri, Dec 21, 2018 at 6:28 PM Michael Paquier <michael@paquier.xyz> wrote:
> 2) Which kind of interface do we want to use?  When I did my own
> flavor of pg_checksums, I used an --action switch able to use the
> following values:
> - enable
> - disable
> - verify
> The switch cannot be specified twice (perhaps we could enforce the
> last value as other binaries do in the tree, not sure if that's
> adapted here).  A second type of interface is to use one switch per
> action.  For both interfaces if no action is specify then the tool
> fails.  Vote is open.

I vote for separate switches.  Using the same switch with an argument
seems like it adds typing for no real gain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Hallo Michael,

> It adds an (now mandatory) --action parameter that takes either verify,
> enable or disable as argument.

I'd rather have explicit switches for verify, enable & disable, and verify 
would be the default if none is provided.

> This is basically meant as a stop-gap measure in case online activation
> of checksums won't make it for v12, but maybe it is independently
> useful?

I would say yes.

> 1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
> only verify checksums.

I'd agree to rename the tool as "pg_checksums".

> 2. Rename the scan_* functions (Michael renamed them to operate_file and
> operate_directory but I am not sure it is worth it.

Hmmm. The file is indeed scanned, and "operate" is kind of very fuzzy.

> 3. Once that patch is in, there would be a way to disable checksums so
> there'd be a case to also change the initdb default to enabled, but that
> required further discussion (and maybe another round of benchmarks).

My 0.02€ is that data safety should comes first, thus checksums should be 
enabled by default.

About the patch: applies, compiles, "make check" ok.

There is no documentation.

In "scan_file", I would open RW only for enable, but keep RO for verify.

Also, the full page is rewritten... would it make sense to only overwrite 
the checksum part itself?

It seems that the control file is unlinked and then rewritten. If the 
rewritting fails, or the command is interrupted, the user has a problem.

Could the control file be simply opened RW? Else, I would suggest to 
rename (eg add .tmp), write the new one, then unlink the old one, so that 
recovering the old state in case of problem is possible.

For enable/disable, while the command is running, it should mark the 
cluster as opened to prevent an unwanted database start. I do not see 
where this is done.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Dec 26, 2018 at 07:43:17PM +0100, Fabien COELHO wrote:
>> It adds an (now mandatory) --action parameter that takes either verify,
>> enable or disable as argument.
>
> I'd rather have explicit switches for verify, enable & disable, and verify
> would be the default if none is provided.

Okay, noted for the separate switches.  But I don't agree with the
point of assuming that --verify should be enforced if no switches are
defined.  That feels like a trap for newcomers of this tool..

> For enable/disable, while the command is running, it should mark the cluster
> as opened to prevent an unwanted database start. I do not see where this is
> done.

You have pretty much the same class of problems if you attempt to
start a cluster on which pg_rewind or the existing pg_verify_checksums
is run after these have scanned the control file to make sure that
they work on a cleanly-stopped instance.  In short, this is a deal
between code simplicity and trying to have the tool outsmart users in
the way users use the tool.  I tend to prefer keeping the code simple
and not worry about cases where the users mess up with their
application, as there are many more things which could go wrong.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël,

>>> It adds an (now mandatory) --action parameter that takes either verify,
>>> enable or disable as argument.
>>
>> I'd rather have explicit switches for verify, enable & disable, and verify
>> would be the default if none is provided.
>
> Okay, noted for the separate switches.  But I don't agree with the
> point of assuming that --verify should be enforced if no switches are
> defined.  That feels like a trap for newcomers of this tool..

Hmmm. It does something safe and useful, especially if it also works 
online (patch pending), and the initial tool only does checking. However, 
I'd be okay for no default.

>> For enable/disable, while the command is running, it should mark the cluster
>> as opened to prevent an unwanted database start. I do not see where this is
>> done.
>
> You have pretty much the same class of problems if you attempt to
> start a cluster on which pg_rewind or the existing pg_verify_checksums
> is run after these have scanned the control file to make sure that
> they work on a cleanly-stopped instance.  In short, this is a deal
> between code simplicity and trying to have the tool outsmart users in
> the way users use the tool.  I tend to prefer keeping the code simple
> and not worry about cases where the users mess up with their
> application, as there are many more things which could go wrong.

Hmmm. I do not buy the comparison.

A verify that fails is not a big problem, you can run it again. If 
pg_rewind fails, you can probably run it again as well, the source is 
probably still consistent even if it has changed, and too bad for the 
target side, but it was scheduled to be overwritten anyway.

However, a tool which overwrites files beyond the back of a running server 
is a recipee for data-loss, so I think it should take much more care, i.e. 
set the server state into some specific safe state.

About code simplicity: probably there is, or there should be, a 
change-the-state function somewhere, because quite a few tools could use 
it?

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:


On Thu, Dec 27, 2018 at 2:15 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Dec 26, 2018 at 07:43:17PM +0100, Fabien COELHO wrote:
>> It adds an (now mandatory) --action parameter that takes either verify,
>> enable or disable as argument.
>
> I'd rather have explicit switches for verify, enable & disable, and verify
> would be the default if none is provided.

Okay, noted for the separate switches.  But I don't agree with the
point of assuming that --verify should be enforced if no switches are
defined.  That feels like a trap for newcomers of this tool..

Defaulting to the choice that makes no actual changes to the data surely is the safe choice,a nd not a trap :)

That said, this would probably be our first tool where you switch it between readonly and rewrite mode with just a switch, woudn't it? All other tools are either read-only or read/write at the *tool* level, not the switch level.

That in itself would be an argument for making it a separate tool. But not a very strong one I think, I prefer the single-tool-renamed approach as well.

There's plenty enough precedent for the "separate switches and a default behaviour if none is specified" in other tools though, and I don't think that's generally considered a trap.

So count me in the camp for separate switches and default to verify. If one didn't mean that, it's only a quick Ctrl-C away with no damage done.


> For enable/disable, while the command is running, it should mark the cluster
> as opened to prevent an unwanted database start. I do not see where this is
> done.

You have pretty much the same class of problems if you attempt to
start a cluster on which pg_rewind or the existing pg_verify_checksums
is run after these have scanned the control file to make sure that
they work on a cleanly-stopped instance.  In short, this is a deal
between code simplicity and trying to have the tool outsmart users in
the way users use the tool.  I tend to prefer keeping the code simple
and not worry about cases where the users mess up with their
application, as there are many more things which could go wrong.

I think it comes down to what the outcome is. If we're going to end up with a corrupt database (e.g. one where checksums aren't set everywhere but they are marked as such in pg_control) then it's not acceptable. If the only outcome is the tool gives an error that's not an error and if re-run it's fine, then it's a different story. 

--

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:


On Sat, Dec 22, 2018 at 12:28 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:

I think that this is independently useful, I got this stuff part of an
upgrade workflow where the user is ready to accept some extra one-time
offline time so as checksums are enabled.

Very much so, IMHO.


> Things I have not done so far:
>
> 1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
> only verify checksums.

Check.  That sounds right to me.

Should we double-check with packagers that this won't cause a problem? Though the fact that it's done in a major release should make it perfectly fine I think -- and it's a smaller change than when we did all those xlog->wal changes...


> 3. Once that patch is in, there would be a way to disable checksums so
> there'd be a case to also change the initdb default to enabled, but that
> required further discussion (and maybe another round of benchmarks).

Perhaps, that's unrelated to this thread though.  I am not sure that
all users would be ready to pay the extra cost of checksums enabled by
default.

I'd be a strong +1 for changing the default once we have a painless way to turn them off.

It remains super-cheap to turn them off (stop database, one command, turn them on). So those people that aren't willing to pay the overhead of checksums, can very cheaply get away from it.

It's a lot more expensive to turn them on once your database has grown to some size (definitely in offline mode, but also in an online mode when we get that one in).

Plus, the majority of people *should* want them on :) We don't run with say synchronous_commit=off by default either to make it easier on those that don't want to pay the overhead of full data safety :P (I know it's not a direct match, but you get the idea)

--

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
>> For enable/disable, while the command is running, it should mark the 
>> cluster as opened to prevent an unwanted database start. I do not see 
>> where this is done.
>>
>> You have pretty much the same class of problems if you attempt to
>> start a cluster on which pg_rewind or the existing pg_verify_checksums
>> is run after these have scanned the control file to make sure that
>> they work on a cleanly-stopped instance. [...]
>
> I think it comes down to what the outcome is. If we're going to end up with
> a corrupt database (e.g. one where checksums aren't set everywhere but they
> are marked as such in pg_control) then it's not acceptable. If the only
> outcome is the tool gives an error that's not an error and if re-run it's
> fine, then it's a different story.

ISTM that such an outcome is indeed a risk, as a starting postgres could 
update already checksummed pages without putting a checksum. It could be 
even worse, although with a (very) low probability, with updates 
overwritten on a race condition between the processes. In any case, no 
error would be reported before much later, with invalid checksums or 
inconsistent data, or undetected forgotten committed data.

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Tomas Vondra
Дата:
On 12/27/18 11:43 AM, Magnus Hagander wrote:
> 
> 
> On Sat, Dec 22, 2018 at 12:28 AM Michael Paquier <michael@paquier.xyz
> <mailto:michael@paquier.xyz>> wrote:
> 
>     On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:
> 
>     I think that this is independently useful, I got this stuff part of an
>     upgrade workflow where the user is ready to accept some extra one-time
>     offline time so as checksums are enabled.
> 
> 
> Very much so, IMHO.
> 
> 
>     > Things I have not done so far:
>     >
>     > 1. Rename pg_verify_checksums to e.g. pg_checksums as it will no
>     longer
>     > only verify checksums.
> 
>     Check.  That sounds right to me.
> 
> 
> Should we double-check with packagers that this won't cause a problem?
> Though the fact that it's done in a major release should make it
> perfectly fine I think -- and it's a smaller change than when we did all
> those xlog->wal changes...
> 

I think it makes little sense to not rename the tool now. I'm pretty
sure we'd end up doing that sooner or later anyway, and we'll just live
with a misnamed tool until then.

> 
>     > 3. Once that patch is in, there would be a way to disable checksums so
>     > there'd be a case to also change the initdb default to enabled,
>     but that
>     > required further discussion (and maybe another round of benchmarks).
> 
>     Perhaps, that's unrelated to this thread though.  I am not sure that
>     all users would be ready to pay the extra cost of checksums enabled by
>     default.
> 
> 
> I'd be a strong +1 for changing the default once we have a painless way
> to turn them off.
> 
> It remains super-cheap to turn them off (stop database, one command,
> turn them on). So those people that aren't willing to pay the overhead
> of checksums, can very cheaply get away from it.
> 
> It's a lot more expensive to turn them on once your database has grown
> to some size (definitely in offline mode, but also in an online mode
> when we get that one in).
> 
> Plus, the majority of people *should* want them on :) We don't run with
> say synchronous_commit=off by default either to make it easier on those
> that don't want to pay the overhead of full data safety :P (I know it's
> not a direct match, but you get the idea)
> 

I don't know, TBH. I agree making the on/off change cheaper makes moves
us closer to 'on' by default, because they may disable it if needed. But
it's not the whole story.

If we enable checksums by default, 99% users will have them enabled.
That means more people will actually observe data corruption cases that
went unnoticed so far. What shall we do with that? We don't have very
good answers to that (tooling, docs) and I'd say "disable checksums" is
not a particularly amazing response in this case :-(

FWIW I don't know what to do about that. We certainly can't prevent the
data corruption, but maybe we could help with fixing it (although that's
bound to be low-level work).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Offline enabling/disabling of data checksums

От
Tomas Vondra
Дата:

On 12/27/18 11:39 AM, Magnus Hagander wrote:
> 
> 
> On Thu, Dec 27, 2018 at 2:15 AM Michael Paquier <michael@paquier.xyz
> <mailto:michael@paquier.xyz>> wrote:
> 
>     On Wed, Dec 26, 2018 at 07:43:17PM +0100, Fabien COELHO wrote:
>     >> It adds an (now mandatory) --action parameter that takes either
>     verify,
>     >> enable or disable as argument.
>     >
>     > I'd rather have explicit switches for verify, enable & disable,
>     and verify
>     > would be the default if none is provided.
> 
>     Okay, noted for the separate switches.  But I don't agree with the
>     point of assuming that --verify should be enforced if no switches are
>     defined.  That feels like a trap for newcomers of this tool..
> 
> 
> Defaulting to the choice that makes no actual changes to the data surely
> is the safe choice,a nd not a trap :)
> 
> That said, this would probably be our first tool where you switch it
> between readonly and rewrite mode with just a switch, woudn't it? All
> other tools are either read-only or read/write at the *tool* level, not
> the switch level.
> 

Eh? Isn't pg_rewind "modify by default" with --dry-run switch to run in
a read-only mode. So I'm not sure what you mean by "tool level" here.

FWIW I'd prefer sticking to the same approach for this tool too, i.e.
have a "dry-run" switch that makes it read-only. IMHO that's pretty
common pattern.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:


On Thu, Dec 27, 2018 at 3:54 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 12/27/18 11:39 AM, Magnus Hagander wrote:
>
>
> On Thu, Dec 27, 2018 at 2:15 AM Michael Paquier <michael@paquier.xyz
> <mailto:michael@paquier.xyz>> wrote:
>
>     On Wed, Dec 26, 2018 at 07:43:17PM +0100, Fabien COELHO wrote:
>     >> It adds an (now mandatory) --action parameter that takes either
>     verify,
>     >> enable or disable as argument.
>     >
>     > I'd rather have explicit switches for verify, enable & disable,
>     and verify
>     > would be the default if none is provided.
>
>     Okay, noted for the separate switches.  But I don't agree with the
>     point of assuming that --verify should be enforced if no switches are
>     defined.  That feels like a trap for newcomers of this tool..
>
>
> Defaulting to the choice that makes no actual changes to the data surely
> is the safe choice,a nd not a trap :)
>
> That said, this would probably be our first tool where you switch it
> between readonly and rewrite mode with just a switch, woudn't it? All
> other tools are either read-only or read/write at the *tool* level, not
> the switch level.
>

Eh? Isn't pg_rewind "modify by default" with --dry-run switch to run in
a read-only mode. So I'm not sure what you mean by "tool level" here.

FWIW I'd prefer sticking to the same approach for this tool too, i.e.
have a "dry-run" switch that makes it read-only. IMHO that's pretty
common pattern.

That's a different thing.

pg_rewind in dry-run mode does the same thing, except it doesn't actually do it, it just pretends.

Verifying checksums is not the same as "turn on checksums except don't actually do it" or "turn off checksums except don't actually do it". 

--

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:
> On 12/27/18 11:43 AM, Magnus Hagander wrote:
>> Should we double-check with packagers that this won't cause a problem?
>> Though the fact that it's done in a major release should make it
>> perfectly fine I think -- and it's a smaller change than when we did all
>> those xlog->wal changes...
>>
>
> I think it makes little sense to not rename the tool now. I'm pretty
> sure we'd end up doing that sooner or later anyway, and we'll just live
> with a misnamed tool until then.

Do you think that a thread Would on -packagers be more adapted then?

> I don't know, TBH. I agree making the on/off change cheaper makes moves
> us closer to 'on' by default, because they may disable it if needed. But
> it's not the whole story.
>
> If we enable checksums by default, 99% users will have them enabled.
> That means more people will actually observe data corruption cases that
> went unnoticed so far. What shall we do with that? We don't have very
> good answers to that (tooling, docs) and I'd say "disable checksums" is
> not a particularly amazing response in this case :-(

Enabling data checksums by default is still a couple of steps ahead,
without a way to control them better..

> FWIW I don't know what to do about that. We certainly can't prevent the
> data corruption, but maybe we could help with fixing it (although that's
> bound to be low-level work).

Yes, data checksums are extremely useful to tell people when the
problem is *not* from Postgres, which can be really hard in a large
organization.  Knowing about the corrupted page is also useful as you
can look at its contents and look at its bytes before it gets zero'ed
to spot patterns which can help other teams in charge of a lower level
of the application layer.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Tomas Vondra
Дата:

On 12/28/18 12:25 AM, Michael Paquier wrote:
> On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:
>> On 12/27/18 11:43 AM, Magnus Hagander wrote:
>>> Should we double-check with packagers that this won't cause a problem?
>>> Though the fact that it's done in a major release should make it
>>> perfectly fine I think -- and it's a smaller change than when we did all
>>> those xlog->wal changes...
>>>
>>
>> I think it makes little sense to not rename the tool now. I'm pretty
>> sure we'd end up doing that sooner or later anyway, and we'll just live
>> with a misnamed tool until then.
> 
> Do you think that a thread Would on -packagers be more adapted then?
> 

I'm sorry, but I'm not sure I understand the question. Of course, asking
over at -packagers won't hurt, but my guess is the response will be it's
not a big deal from the packaging perspective.

>> I don't know, TBH. I agree making the on/off change cheaper makes moves
>> us closer to 'on' by default, because they may disable it if needed. But
>> it's not the whole story.
>>
>> If we enable checksums by default, 99% users will have them enabled.
>> That means more people will actually observe data corruption cases that
>> went unnoticed so far. What shall we do with that? We don't have very
>> good answers to that (tooling, docs) and I'd say "disable checksums" is
>> not a particularly amazing response in this case :-(
> 
> Enabling data checksums by default is still a couple of steps ahead,
> without a way to control them better..
> 

What do you mean by "control" here? Dealing with checksum failures, or
some additional capabilities?

>> FWIW I don't know what to do about that. We certainly can't prevent the
>> data corruption, but maybe we could help with fixing it (although that's
>> bound to be low-level work).
> 
> Yes, data checksums are extremely useful to tell people when the
> problem is *not* from Postgres, which can be really hard in a large
> organization.  Knowing about the corrupted page is also useful as you
> can look at its contents and look at its bytes before it gets zero'ed
> to spot patterns which can help other teams in charge of a lower level
> of the application layer.

I'm not sure data checksums are particularly great evidence. For example
with the recent fsync issues, we might have ended with partial writes
(and thus invalid checksums). The OS migh have even told us about the
failure, but we've gracefully ignored it. So I'm afraid data checksums
are not a particularly great proof it's not our fault.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Dec 28, 2018 at 01:14:05AM +0100, Tomas Vondra wrote:
> I'm sorry, but I'm not sure I understand the question. Of course, asking
> over at -packagers won't hurt, but my guess is the response will be it's
> not a big deal from the packaging perspective.

(The previous email had an extra "Would"...  Sorry.)
Let's ask those folks then.

> What do you mean by "control" here? Dealing with checksum failures, or
> some additional capabilities?

What I am referring to here is the possibility to enable, disable and
check checksums for an online cluster.  I am not sure what kind of
tooling able to do chirurgy at page level would make sense.  Once a
checksum is corrupted a user knows about a problem, which mainly needs
a human lookup.

> I'm not sure data checksums are particularly great evidence. For example
> with the recent fsync issues, we might have ended with partial writes
> (and thus invalid checksums). The OS migh have even told us about the
> failure, but we've gracefully ignored it. So I'm afraid data checksums
> are not a particularly great proof it's not our fault.

Sure, they are not a solution to all problems.  Still they give hints
before the problem spreads, and sometimes by looking at one corrupted
page by yourself one can see if the data fetched from disk comes from
Postgres or not (say inspecting the page header with pageinspect,
etc.).
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:


On Fri, Dec 28, 2018 at 1:14 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 12/28/18 12:25 AM, Michael Paquier wrote:
> On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:
>> On 12/27/18 11:43 AM, Magnus Hagander wrote:
>>> Should we double-check with packagers that this won't cause a problem?
>>> Though the fact that it's done in a major release should make it
>>> perfectly fine I think -- and it's a smaller change than when we did all
>>> those xlog->wal changes...
>>>
>>
>> I think it makes little sense to not rename the tool now. I'm pretty
>> sure we'd end up doing that sooner or later anyway, and we'll just live
>> with a misnamed tool until then.
>
> Do you think that a thread Would on -packagers be more adapted then?
>

I'm sorry, but I'm not sure I understand the question. Of course, asking
over at -packagers won't hurt, but my guess is the response will be it's
not a big deal from the packaging perspective.

I think a heads- up in the way of "planning to change it, now's the time to yell" is the reasonable thing.


>> I don't know, TBH. I agree making the on/off change cheaper makes moves
>> us closer to 'on' by default, because they may disable it if needed. But
>> it's not the whole story.
>>
>> If we enable checksums by default, 99% users will have them enabled.
>> That means more people will actually observe data corruption cases that
>> went unnoticed so far. What shall we do with that? We don't have very
>> good answers to that (tooling, docs) and I'd say "disable checksums" is
>> not a particularly amazing response in this case :-(
>
> Enabling data checksums by default is still a couple of steps ahead,
> without a way to control them better..
>

What do you mean by "control" here? Dealing with checksum failures, or
some additional capabilities?

>> FWIW I don't know what to do about that. We certainly can't prevent the
>> data corruption, but maybe we could help with fixing it (although that's
>> bound to be low-level work).
>
> Yes, data checksums are extremely useful to tell people when the
> problem is *not* from Postgres, which can be really hard in a large
> organization.  Knowing about the corrupted page is also useful as you
> can look at its contents and look at its bytes before it gets zero'ed
> to spot patterns which can help other teams in charge of a lower level
> of the application layer.

I'm not sure data checksums are particularly great evidence. For example
with the recent fsync issues, we might have ended with partial writes
(and thus invalid checksums). The OS migh have even told us about the
failure, but we've gracefully ignored it. So I'm afraid data checksums
are not a particularly great proof it's not our fault.

They are a great evidence that your data is corrupt. You *want* to know that your data is corrupt. Even if our best recommendation is "go restore your backups", you still want to know. Otherwise you are sitting around on data that's corrupt and you don't know about it.

There are certainly many things we can do to improve the experience. But not telling people their data is coorrupt when it is, isn't one of them. 

--

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
>>> [...]
>>
>> I'm not sure data checksums are particularly great evidence. For example
>> with the recent fsync issues, we might have ended with partial writes
>> (and thus invalid checksums). The OS migh have even told us about the
>> failure, but we've gracefully ignored it. So I'm afraid data checksums
>> are not a particularly great proof it's not our fault.
>
> They are a great evidence that your data is corrupt. You *want* to know
> that your data is corrupt. Even if our best recommendation is "go restore
> your backups", you still want to know. Otherwise you are sitting around on
> data that's corrupt and you don't know about it.
>
> There are certainly many things we can do to improve the experience. But
> not telling people their data is coorrupt when it is, isn't one of them.

Yep, anyone should want to know if their database is corrupt, compare to 
ignoring the fact.

One reason not to enable it could be if the implementation is not trusted, 
i.e. if false positive (corrupt page detected while the data are okay and 
there was only an issue with computing or storing the checksum) can occur.

There is also the performance impact. I did some quick-and-dirty pgbench 
simple update single thread performance tests to compare with vs without 
checksum. Enabling checksums on these tests seems to induce a 1.4% 
performance penalty, although I'm moderately confident about it given the 
standard deviation. At least it is an indication, and it seems to me that 
it is consistent with other figures previously reported on the list.

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Dec 28, 2018 at 10:12:24AM +0100, Magnus Hagander wrote:
> I think a heads- up in the way of "planning to change it, now's the time to
> yell" is the reasonable thing.

And done.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Freitag, den 28.12.2018, 10:12 +0100 schrieb Magnus Hagander:
> On Fri, Dec 28, 2018 at 1:14 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > On 12/28/18 12:25 AM, Michael Paquier wrote:
> > > On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:
> > >> On 12/27/18 11:43 AM, Magnus Hagander wrote:
> > >>> Should we double-check with packagers that this won't cause a problem?
> > >>> Though the fact that it's done in a major release should make it
> > >>> perfectly fine I think -- and it's a smaller change than when we did all
> > >>> those xlog->wal changes...
> > >>>
> > >>
> > >> I think it makes little sense to not rename the tool now. I'm pretty
> > >> sure we'd end up doing that sooner or later anyway, and we'll just live
> > >> with a misnamed tool until then.
> > > 
> > > Do you think that a thread Would on -packagers be more adapted then?
> > 
> > I'm sorry, but I'm not sure I understand the question. Of course, asking
> > over at -packagers won't hurt, but my guess is the response will be it's
> > not a big deal from the packaging perspective.
> 
> I think a heads- up in the way of "planning to change it, now's the
> time to yell" is the reasonable thing.

Renaming applications shouldn't be a problem unless they have to be
moved from one binary package to another. I assume all packagers ship
all client/server binaries in one package, respectively (and not e.g. a
dedicated postgresql-11-pg_test_fsync package), this should only be a
matter of updating package metadata.

In any case, it should be identical to the xlog->wal rename.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Sat, Dec 29, 2018 at 11:55:43AM +0100, Michael Banck wrote:
> Renaming applications shouldn't be a problem unless they have to be
> moved from one binary package to another. I assume all packagers ship
> all client/server binaries in one package, respectively (and not e.g. a
> dedicated postgresql-11-pg_test_fsync package), this should only be a
> matter of updating package metadata.
>
> In any case, it should be identical to the xlog->wal rename.

I have poked -packagers on the matter and I am seeing no complains, so
let's move forward with this stuff.  From the consensus I am seeing on
the thread, we have been discussing about the following points:
1) Rename pg_verify_checksums to pg_checksums.
2) Have separate switches for each action, aka --verify, --enable and
--disable, or a unified --action switch which can take different
values.
3) Do we want to imply --verify by default if no switch is specified?

About 2), folks who have expressed an opinion are:
- Multiple switches: Robert, Fabien, Magnus
- Single --action switch: Michael B, Michael P

About 3), aka --verify implied if no action is specified:
- In favor: Fabien C, Magnus
- Against: Michael P

If I missed what someone said, please feel free to complete with your
votes here.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Dienstag, den 01.01.2019, 11:38 +0900 schrieb Michael Paquier:
> On Sat, Dec 29, 2018 at 11:55:43AM +0100, Michael Banck wrote:
> > Renaming applications shouldn't be a problem unless they have to be
> > moved from one binary package to another. I assume all packagers ship
> > all client/server binaries in one package, respectively (and not e.g. a
> > dedicated postgresql-11-pg_test_fsync package), this should only be a
> > matter of updating package metadata.
> > 
> > In any case, it should be identical to the xlog->wal rename.
> 
> I have poked -packagers on the matter and I am seeing no complains, so
> let's move forward with this stuff.  From the consensus I am seeing on
> the thread, we have been discussing about the following points:
> 1) Rename pg_verify_checksums to pg_checksums.
> 2) Have separate switches for each action, aka --verify, --enable and
> --disable, or a unified --action switch which can take different
> values.
> 3) Do we want to imply --verify by default if no switch is specified?
> 
> About 2), folks who have expressed an opinion are:
> - Multiple switches: Robert, Fabien, Magnus
> - Single --action switch: Michael B, Michael P

I implemented the multiple switches thing in my branch first anyway and
don't mind a lot either way; I think the consensus goes towards multiple
switches.

> About 3), aka --verify implied if no action is specified:
> - In favor: Fabien C, Magnus
> - Against: Michael P

I think I'm in favor as well.

I wonder whether we (or packagers) could then just ship a
pg_verify_checksums -> pg_checksums symlink for compatibility if we/they
want, as the behaviour would stay the same?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Tue, Jan 01, 2019 at 11:42:49AM +0100, Michael Banck wrote:
> Am Dienstag, den 01.01.2019, 11:38 +0900 schrieb Michael Paquier:
>> About 3), aka --verify implied if no action is specified:
>> - In favor: Fabien C, Magnus
>> - Against: Michael P
>
> I think I'm in favor as well.

Okay, it looks to be the direction to take then.

> I wonder whether we (or packagers) could then just ship a
> pg_verify_checksums -> pg_checksums symlink for compatibility if we/they
> want, as the behaviour would stay the same?

In the v10 dev cycle this part has been discarded for the switch from
pg_xlogdump to pg_waldump.  I don't think that's worth bothering this
time either in the build.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Stephen Frost
Дата:
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On 12/27/18 11:43 AM, Magnus Hagander wrote:
> > Plus, the majority of people *should* want them on :) We don't run with
> > say synchronous_commit=off by default either to make it easier on those
> > that don't want to pay the overhead of full data safety :P (I know it's
> > not a direct match, but you get the idea)

+1 to having them on by default, we should have done that a long time
ago.

> I don't know, TBH. I agree making the on/off change cheaper makes moves
> us closer to 'on' by default, because they may disable it if needed. But
> it's not the whole story.
>
> If we enable checksums by default, 99% users will have them enabled.

Yes, and they'll then be able to catch data corruption much earlier.
Today, 99% of our users don't have them enabled and have no clue if
their data has been corrupted on disk, or not.  That's not good.

> That means more people will actually observe data corruption cases that
> went unnoticed so far. What shall we do with that? We don't have very
> good answers to that (tooling, docs) and I'd say "disable checksums" is
> not a particularly amazing response in this case :-(

Now that we've got a number of tools available which will check the
checksums in a running system and throw up warnings when found
(pg_basebackup, pgBackRest and I think other backup tools,
pg_checksums...), users will see corruption and have the option to
restore from a backup before those backups expire out and they're left
with a corrupt database and backups which also have that corruption.

This ongoing call for specific tooling to do "something" about checksums
is certainly good, but it's not right to say that we don't have existing
documentation- we do, quite a bit of it, and it's all under the heading
of "Backup and Recovery".

> FWIW I don't know what to do about that. We certainly can't prevent the
> data corruption, but maybe we could help with fixing it (although that's
> bound to be low-level work).

There's been some effort to try and automagically correct corrupted
pages but it's certainly not something I'm ready to trust beyond a
"well, this is what it might have been" review.  The answer today is to
find a backup which isn't corrupt and restore from it on a known-good
system.  If adding explicit documentation to that effect would reduce
your level of concern when it comes to enabling checksums by default,
then I'm happy to do that.

Thanks!

Stephen

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Donnerstag, den 27.12.2018, 12:26 +0100 schrieb Fabien COELHO:
> > > For enable/disable, while the command is running, it should mark the 
> > > cluster as opened to prevent an unwanted database start. I do not see 
> > > where this is done.
> > > 
> > > You have pretty much the same class of problems if you attempt to
> > > start a cluster on which pg_rewind or the existing pg_verify_checksums
> > > is run after these have scanned the control file to make sure that
> > > they work on a cleanly-stopped instance. [...]
> > 
> > I think it comes down to what the outcome is. If we're going to end up with
> > a corrupt database (e.g. one where checksums aren't set everywhere but they
> > are marked as such in pg_control) then it's not acceptable. If the only
> > outcome is the tool gives an error that's not an error and if re-run it's
> > fine, then it's a different story.
> 
> ISTM that such an outcome is indeed a risk, as a starting postgres could 
> update already checksummed pages without putting a checksum. It could be 
> even worse, although with a (very) low probability, with updates 
> overwritten on a race condition between the processes. In any case, no 
> error would be reported before much later, with invalid checksums or 
> inconsistent data, or undetected forgotten committed data.

One difference between pg_rewind and pg_checksums is that the latter
potentially runs for a longer time (or rather a non-trivial amount of
time, compared to pg_rewind), so the margin of error of another DBA
saying "oh, that DB is down, let me start it again" might be much
higher. 

The question is how to reliably do this in an acceptable way? Just
faking a postmaster.pid sounds pretty hackish to me, do you have any
suggestions here?

The alternative would be to document that it needs to be made sure that
the database is not started up during enabling of checksums, yielding to
pilot error.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Mittwoch, den 26.12.2018, 19:43 +0100 schrieb Fabien COELHO:
> > It adds an (now mandatory) --action parameter that takes either verify,
> > enable or disable as argument.
> 
> I'd rather have explicit switches for verify, enable & disable, and verify 
> would be the default if none is provided.

I changed that to the switches -c/--verify (-c for check as -v is taken,
should it be --check as well? I personally like verify better), 
-d/--disable and -e/--enable.

> About the patch: applies, compiles, "make check" ok.
> 
> There is no documentation.

Yeah, I'll write that once the CLI is settled.

> In "scan_file", I would open RW only for enable, but keep RO for verify.

OK, I've changed that.

> Also, the full page is rewritten... would it make sense to only overwrite 
> the checksum part itself?

So just writing the page header? I find that a bit scary and don't
expect much speedup as the OS would write the whole block anyway I
guess? I haven't touched that yet.

> It seems that the control file is unlinked and then rewritten. If the 
> rewritting fails, or the command is interrupted, the user has a problem.
> 
> Could the control file be simply opened RW? Else, I would suggest to 
> rename (eg add .tmp), write the new one, then unlink the old one, so that 
> recovering the old state in case of problem is possible.

I have mostly taken the pg_rewind code here; if there was a function
that allowed for safe offline changes of the control file, I'd be happy
to use it but I don't think it should be this patch to invent that.

In any case, I have removed the unlink() now (not sure where that came
from), and changed it to open(O_WRONLY) same as in Michael's code and
pg_rewind.    

V2 attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:


> One difference between pg_rewind and pg_checksums is that the latter
> potentially runs for a longer time (or rather a non-trivial amount of
> time, compared to pg_rewind), so the margin of error of another DBA
> saying "oh, that DB is down, let me start it again" might be much
> higher.
>
> The question is how to reliably do this in an acceptable way? Just
> faking a postmaster.pid sounds pretty hackish to me, do you have any
> suggestions here?

Adding a new state to ControlFileData which would prevent it from 
starting?

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Bernd Helmle
Дата:
Am Dienstag, den 08.01.2019, 15:09 +0100 schrieb Fabien COELHO:
> > The question is how to reliably do this in an acceptable way? Just
> > faking a postmaster.pid sounds pretty hackish to me, do you have
> > any
> > suggestions here?
> 
> Adding a new state to ControlFileData which would prevent it from 
> starting?

But then you have to make sure the control flag gets cleared in any
case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...

Setting the checksum flag is done after having finished all blocks, so
there is no problem. But we need to set this new flag before and reset
it afterwards, so in between strange things can happen (as the various
calls to exit() within error handling illustrates). 

Bernd.




Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Am Dienstag, den 08.01.2019, 15:39 +0100 schrieb Bernd Helmle:
> Am Dienstag, den 08.01.2019, 15:09 +0100 schrieb Fabien COELHO:
> > > The question is how to reliably do this in an acceptable way? Just
> > > faking a postmaster.pid sounds pretty hackish to me, do you have
> > > any
> > > suggestions here?
> > 
> > Adding a new state to ControlFileData which would prevent it from 
> > starting?
> 
> But then you have to make sure the control flag gets cleared in any
> case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...
> 
> Setting the checksum flag is done after having finished all blocks, so
> there is no problem. But we need to set this new flag before and reset
> it afterwards, so in between strange things can happen (as the various
> calls to exit() within error handling illustrates). 

It seems writing a note like "pg_checksums is running" into the
postmaster.pid would work, and would give a hopefully useful hint to
somebody trying to start Postgres while pg_checksums is running:

postgres@kohn:~$ echo  "pg_checksums running with pid 1231, cluster disabled" > data/postmaster.pid 
postgres@kohn:~$ pg_ctl -D data -l logfile start
pg_ctl: invalid data in PID file "data/postmaster.pid"
postgres@kohn:~$ echo $?
1
postgres@kohn:~$ 

If the DBA then just simply deletes postmaster.pid and starts over, well
then I call pilot error; though we could in theory change pg_ctl (or
whatever checks postmaster.pid) to emit an even more useful error
message if it encounters a "cluster is locked" keyword in it.

Not sure whether everybody likes that (or is future-proof for that
matter), but I like it better than adding a new field to the control
file, for the reasons Bernd outlined above.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
>> Adding a new state to ControlFileData which would prevent it from
>> starting?
>
> But then you have to make sure the control flag gets cleared in any
> case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...

The usual approach is a restart with some --force option?

> Setting the checksum flag is done after having finished all blocks, so
> there is no problem.

There is also a problem if the db is started while the checksum is being 
enabled.

> But we need to set this new flag before and reset it afterwards, so in 
> between strange things can happen (as the various calls to exit() within 
> error handling illustrates).

Sure, there is some need for a backup plan if it fails and the control 
file is let in a wrong state.

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
>> Setting the checksum flag is done after having finished all blocks, so
>> there is no problem. But we need to set this new flag before and reset
>> it afterwards, so in between strange things can happen (as the various
>> calls to exit() within error handling illustrates).
>
> It seems writing a note like "pg_checksums is running" into the
> postmaster.pid would work, and would give a hopefully useful hint to
> somebody trying to start Postgres while pg_checksums is running:
>
> postgres@kohn:~$ echo  "pg_checksums running with pid 1231, cluster disabled" > data/postmaster.pid 
> postgres@kohn:~$ pg_ctl -D data -l logfile start
> pg_ctl: invalid data in PID file "data/postmaster.pid"
> postgres@kohn:~$ echo $?
> 1
> postgres@kohn:~$ 

Looks ok, but I'm unsure how portable it is though. What if started with 
"postmater" directly?

> If the DBA then just simply deletes postmaster.pid and starts over, well
> then I call pilot error; though we could in theory change pg_ctl (or
> whatever checks postmaster.pid) to emit an even more useful error
> message if it encounters a "cluster is locked" keyword in it.
>
> Not sure whether everybody likes that (or is future-proof for that
> matter), but I like it better than adding a new field to the control
> file, for the reasons Bernd outlined above.

ISTM that the point of the control file is exactly to tell what is current 
the status of the cluster, so it is where this information really belongs?

AFAICS all commands take care of the status in some way to avoid 
accidents.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Bernd Helmle
Дата:
Am Dienstag, den 08.01.2019, 16:17 +0100 schrieb Fabien COELHO:
> > > Adding a new state to ControlFileData which would prevent it from
> > > starting?
> > 
> > But then you have to make sure the control flag gets cleared in any
> > case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...
> 
> The usual approach is a restart with some --force option?
> 
> > Setting the checksum flag is done after having finished all blocks,
> > so
> > there is no problem.
> 
> There is also a problem if the db is started while the checksum is
> being 
> enabled.

What i mean is that interrupting pg_verify_checksums won't leave
pg_control in a state where starting the cluster won't work without any
further interaction.

Bernd.




Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
>>> But then you have to make sure the control flag gets cleared in any
>>> case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...
>>
>> The usual approach is a restart with some --force option?
>>
>>> Setting the checksum flag is done after having finished all blocks, so 
>>> there is no problem.
>>
>> There is also a problem if the db is started while the checksum is 
>> being enabled.
>
> What i mean is that interrupting pg_verify_checksums won't leave 
> pg_control in a state where starting the cluster won't work without any 
> further interaction.

Yep, I understood that, and agree that a way out is needed, hence the 
--force option suggestion.

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Tue, Jan 08, 2019 at 01:03:25PM +0100, Michael Banck wrote:
> I changed that to the switches -c/--verify (-c for check as -v is taken,
> should it be --check as well? I personally like verify better), 
> -d/--disable and -e/--enable.

Indeed we could use --check, pg_checksums --check looks repetitive
still that makes the short option more consistent with the rest.

+   printf(_("  -A, --action   action to take on the cluster, can be set as\n"));
+   printf(_("                 \"verify\", \"enable\" and \"disable\"\n"));
Not reflected yet in the --help portion.

>> Also, the full page is rewritten... would it make sense to only overwrite
>> the checksum part itself?
>
> So just writing the page header? I find that a bit scary and don't
> expect much speedup as the OS would write the whole block anyway I
> guess? I haven't touched that yet.

The OS would write blocks of 4kB out of the 8kB as that's the usual
page size, no?  So this could save a lot of I/O.

> I have mostly taken the pg_rewind code here; if there was a function
> that allowed for safe offline changes of the control file, I'd be happy
> to use it but I don't think it should be this patch to invent that.
>
> In any case, I have removed the unlink() now (not sure where that came
> from), and changed it to open(O_WRONLY) same as in Michael's code and
> pg_rewind.

My own stuff in pg_checksums.c does not have an unlink(), anyway...  I
think that there is room for improvement for both pg_rewind and
pg_checksums here.  What about refactoring updateControlFile() and
move it to controldata_utils.c()?  This centralizes the CRC check,
static assertions, file open and writes into a single place.  The
backend has a similar flavor with UpdateControlFile.  By combining
both we need some extra "ifdef FRONTEND" for BasicOpenFile and the
wait events which generates some noise, still both share a lot.  The
backend also includes a fsync() for the control file which happens
when the file is written, but for pg_checksums and pg_rewind we just
do it in one go at the end, so we would need an extra flag to decide
if fsync should happen or not.  pg_rewind has partially the right
interface by passing ControlFileData contents as an argument.

> V2 attached.

+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
This may look strange, but these are needed because pg_checksums
calls some of the sync-related routines which are defined in fd.c.
Amen.

+   if (fsync(fd) != 0)
+   {
+       fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
+       exit(1);
+   }
No need for that as fsync_pgdata() gets called at the end.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
> I changed that to the switches -c/--verify (-c for check as -v is taken,
> should it be --check as well? I personally like verify better), 
> -d/--disable and -e/--enable.

I agree that checking the checksum sounds repetitive, but I think that for 
consistency --check should be provided.

About the patch: applies, compiles, global & local "make check" are ok.

There is still no documentation.

I think that there is a consensus about renaming the command.

The --help string documents --action which does not exists anymore.

The code in "updateControlFile" seems to allow to create the file 
(O_CREAT). I do not think that it should, it should only apply to an 
existing file.

ISTM that some generalized version of this function should be in 
"src/common/controldata_utils.c" instead of duplicating it from command to 
command (as suggested by Michaël as well).

In "scan_file" verbose output, ISTM that the checksum is more computed 
than enabled on the file. It is really enabled at the cluster level in the 
end.

Maybe there could be only one open call with a ?: for RO vs RW.

Non root check: as files are only manipulated RW, ISTM that there is no 
reason why the ownership would be changed, so I do not think that this 
constraint is useful.

There is kind of a copy paste for enabling/disabling, I'd consider 
skipping the scan when not necessary and merge both branches.

>> Also, the full page is rewritten... would it make sense to only overwrite
>> the checksum part itself?
>
> So just writing the page header? I find that a bit scary and don't
> expect much speedup as the OS would write the whole block anyway I
> guess? I haven't touched that yet.

Possibly the OS would write its block size, which is not necessary the 
same as postgres page size?

>> It seems that the control file is unlinked and then rewritten. If the
>> rewritting fails, or the command is interrupted, the user has a problem.
>>
>> Could the control file be simply opened RW? Else, I would suggest to
>> rename (eg add .tmp), write the new one, then unlink the old one, so that
>> recovering the old state in case of problem is possible.
>
> I have mostly taken the pg_rewind code here; if there was a function
> that allowed for safe offline changes of the control file, I'd be happy
> to use it but I don't think it should be this patch to invent that.

It is reinventing it somehow by duplicating the stuff anyway. I'd suggest 
a separate preparatory patch to do the cleanup.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Andres Freund
Дата:
Hi,

On 2019-01-09 07:07:17 +0100, Fabien COELHO wrote:
> There is still no documentation.

Michael, are you planning to address this?  It'd also be useful to state
when you just don't agree with things / don't plan to address them.

Given the docs piece hasn't been addressed, and seems uncontroversial,
I'm marking this patch as returned with feedback. Please resubmit once
ready.


> > > Also, the full page is rewritten... would it make sense to only overwrite
> > > the checksum part itself?
> > 
> > So just writing the page header? I find that a bit scary and don't
> > expect much speedup as the OS would write the whole block anyway I
> > guess? I haven't touched that yet.
> 
> Possibly the OS would write its block size, which is not necessary the same
> as postgres page size?

I think it'd be a bad idea to write more granular. Very commonly that'll
turn a write operation into a read-modify-write (although caching will
often prevent that from being a problem here), and it'll be bad for
flash translation layers.

Greetings,

Andres Freund


Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

sorry for letting this slack.

First off, thanks for the review!

Am Mittwoch, den 09.01.2019, 07:07 +0100 schrieb Fabien COELHO:
> > I changed that to the switches -c/--verify (-c for check as -v is taken,
> > should it be --check as well? I personally like verify better), 
> > -d/--disable and -e/--enable.
> 
> I agree that checking the checksum sounds repetitive, but I think that for 
> consistency --check should be provided.

Ok then. The enum is currently called PG_ACTION_VERIFY, I changed that
to PG_ACTION_CHECK as well.

> About the patch: applies, compiles, global & local "make check" are ok.
> 
> There is still no documentation.

I've added that now, though I did that blindly and have not checked the
output yet.

> I think that there is a consensus about renaming the command.

I think so as well, but doing that right now will make the patch
difficult to review, so I'd prefer to leave it to the committer to do
that. 

I can submit a patch with the directory/file rename if that is
preferred. 

> The --help string documents --action which does not exists anymore.

Fixed that.

> The code in "updateControlFile" seems to allow to create the file 
> (O_CREAT). I do not think that it should, it should only apply to an 
> existing file.

Removed that.

> ISTM that some generalized version of this function should be in 
> "src/common/controldata_utils.c" instead of duplicating it from command to 
> command (as suggested by Michaël as well).

Haven't done that yet.

> In "scan_file" verbose output, ISTM that the checksum is more computed 
> than enabled on the file. It is really enabled at the cluster level in the 
> end.

It's certainly not just computed but also written. It's true that it
will be only meaningful if the control file is updated accordingly at
the end, but I don't think that message is very incorrect, so left it
as-is for now.

> Maybe there could be only one open call with a ?: for RO vs RW.

Done that.

> Non root check: as files are only manipulated RW, ISTM that there is no 
> reason why the ownership would be changed, so I do not think that this 
> constraint is useful.

Now that we no longer unlink() pg_control, I believe you are right and I
have removed it.
`
> There is kind of a copy paste for enabling/disabling, I'd consider 
> skipping the scan when not necessary and merge both branches.

Done so.

> > > Also, the full page is rewritten... would it make sense to only overwrite
> > > the checksum part itself?
> > 
> > So just writing the page header? I find that a bit scary and don't
> > expect much speedup as the OS would write the whole block anyway I
> > guess? I haven't touched that yet.
> 
> Possibly the OS would write its block size, which is not necessary the 
> same as postgres page size?

I haven't changed that yet, I think Andres was also of the opinion that
this is not necessary?

> > > It seems that the control file is unlinked and then rewritten. If the
> > > rewritting fails, or the command is interrupted, the user has a problem.
> > > 
> > > Could the control file be simply opened RW?

I've done that now.

New patch attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Sun, Feb 17, 2019 at 07:31:38PM +0100, Michael Banck wrote:
> New patch attached.

- * src/bin/pg_verify_checksums/pg_verify_checksums.c
+ * src/bin/pg_checksums/pg_checksums.c
That's lacking a rename, or this comment is incorrect.

+#if PG_VERSION_NUM >= 100000
+   StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
+                    "pg_control is too large for atomic disk writes");
+#endif
This is compiled with only one version of the control file data, so
you don't need that.

Any reason why we don't refactor updateControlFile() into
controldata_utils.c?  This duplicates the code, at the exception of
some details.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Dienstag, den 19.02.2019, 14:02 +0900 schrieb Michael Paquier:
> On Sun, Feb 17, 2019 at 07:31:38PM +0100, Michael Banck wrote:
> > New patch attached.
> 
> - * src/bin/pg_verify_checksums/pg_verify_checksums.c
> + * src/bin/pg_checksums/pg_checksums.c
> That's lacking a rename, or this comment is incorrect.

Right, I started the rename, but then backed off pending further
discussion whether I should submit that or whether the committer will
just do it.

I've backed those 4 in-file renames out for now.

> +#if PG_VERSION_NUM >= 100000
> +   StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
> +                    "pg_control is too large for atomic disk writes");
> +#endif
> This is compiled with only one version of the control file data, so
> you don't need that.

Oops, yeah.

> Any reason why we don't refactor updateControlFile() into
> controldata_utils.c?  This duplicates the code, at the exception of
> some details.

Ok, I've done that now, and migrated pg_rewind as well, do you know of
any other programs that might benefit here?

This could/should probably be committed separately beforehand.

New patch attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Hallo Michael,

>> - * src/bin/pg_verify_checksums/pg_verify_checksums.c
>> + * src/bin/pg_checksums/pg_checksums.c
>> That's lacking a rename, or this comment is incorrect.
>
> Right, I started the rename, but then backed off pending further 
> discussion whether I should submit that or whether the committer will 
> just do it.

ISTM that there is a all clear for renaming.

The renaming implies quite a few changes (eg in the documentation, 
makefiles, tests) which warrants a review, so it should be a patch. Also, 
ISTM that the renaming only make sense when adding the enable/disable 
feature, so I'd say that it belongs to this patch. Opinions?

About v4:

Patch applies cleanly, compiles, global & local "make check" are ok.

Doc: "enable, disable or verifies" -> "checks, enables or disables"?
Spurious space: "> ." -> ">.".

As checksum are now checked, the doc could use "check" instead of 
"verify", especially if there is a rename and the "verify" word 
disappears.

I'd be less terse when documenting actions, eg: "Disable checksums" -> 
"Disable checksums on cluster."

Doc should state that checking is the default action, eg "Check checksums 
on cluster. This is the default action."

Help string could say that -c is the default action. There is a spurious 
help line remaining from the previous "--action" implementation.

open: I'm positively unsure about ?: priority over |, and probably not the 
only one, so I'd add parentheses around the former.

I'm at odds with the "postmaster.pid" check, which would not prevent an 
issue if a cluster is started with "postmaster". I still think that the 
enabling-in-progress should be stored in the cluster state.

ISTM that the cluster read/update cycle should lock somehow the control 
file being modified. However other commands do not seem to do something 
about it.

I do not think that enabling if already enabled or disabling or already 
disable should exit(1), I think it is a no-op and should simply exit(0).

About tests: I'd run a check on a disabled cluster to check that the 
command fails because disabled.

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Feb 27, 2019 at 07:59:31AM +0100, Fabien COELHO wrote:
> The renaming implies quite a few changes (eg in the documentation,
> makefiles, tests) which warrants a review, so it should be a patch. Also,
> ISTM that the renaming only make sense when adding the enable/disable
> feature, so I'd say that it belongs to this patch. Opinions?

I would think that the rename should happen first, but it is possible
to make git diffs less noisy as well for files copied, so merging
things is technically doable.

> About tests: I'd run a check on a disabled cluster to check that the command
> fails because disabled.

While I look at that...  If you could split the refactoring into a
separate, first, patch as well..
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Feb 27, 2019 at 07:59:31AM +0100, Fabien COELHO wrote:
> Hallo Michael,

Okay, let's move on with these patches!

> The renaming implies quite a few changes (eg in the documentation,
> makefiles, tests) which warrants a review, so it should be a patch. Also,
> ISTM that the renaming only make sense when adding the enable/disable
> feature, so I'd say that it belongs to this patch. Opinions?

I have worked on the last v4 sent by Michael B, finishing with the
attached after review and addressed the last points raised by Fabien.
The thing is that I have been rather unhappy with a couple of things
in what was proposed, so I have finished by modifying quite a couple
of areas.  The patch set is now splitted as I think is suited for
commit, with the refactoring and renaming being separated from the
actual feature:
- 0001 if a patch to refactor the routine for the control file
update.  I have made it backend-aware, and we ought to be careful with
error handling, use of fds and such, something that v4 was not very
careful about.
- 0002 renames pg_verify_checksums to pg_checksums with a
straight-forward switch.  Docs as well as all references to
pg_verify_checksums are updated.
- 0003 adds the new options --check, --enable and --disable, with
--check being the default as discussed.
- 0004 adds a -N/--no-sync which I think is nice for consistency with
other tools.  That's also useful for the tests, and was not discussed
until now on this thread.

> Patch applies cleanly, compiles, global & local "make check" are ok.
>
> Doc: "enable, disable or verifies" -> "checks, enables or disables"?
> Spurious space: "> ." -> ">.".
>
> As checksum are now checked, the doc could use "check" instead of "verify",
> especially if there is a rename and the "verify" word disappears.

That makes sense.  I have fixed these, and simplified the docs a bit
to have a more simple page.

> I'd be less terse when documenting actions, eg: "Disable checksums" ->
> "Disable checksums on cluster."

The former is fine in my opinion.

> Doc should state that checking is the default action, eg "Check checksums on
> cluster. This is the default action."

Check.

> Help string could say that -c is the default action. There is a spurious
> help line remaining from the previous "--action" implementation.

Yeah, we should.  Added.

> open: I'm positively unsure about ?: priority over |, and probably not the
> only one, so I'd add parentheses around the former.

Yes, I agree that the current code is hard to decrypt.  So reworked
with a separate variable in scan_file, and added extra parenthesis in
the part which updates the control file.

> I'm at odds with the "postmaster.pid" check, which would not prevent an
> issue if a cluster is started with "postmaster". I still think that the
> enabling-in-progress should be stored in the cluster state.
>
> ISTM that the cluster read/update cycle should lock somehow the control file
> being modified. However other commands do not seem to do something about it.

I am still not on board for adding more complexity in this area, at
least not for this stuff and for the purpose of this thread, because
this can happen at various degrees for various configurations for ages
and not only for checksums.  Also, I think that we still have to see
users complain about that.  Here are some scenarios where this can
happen:
- A base backup partially written.  pg_basebackup limits this risk but
it could still be possible to see a case where partially-written data
folder.  And base backups are around for many years.
- pg_rewind, and the tool is in the tree since 9.5, the tool is
actually available on github since 9.3.

> I do not think that enabling if already enabled or disabling or already
> disable should exit(1), I think it is a no-op and should simply exit(0).

We already issue exit(1) when attempting to verify checksums on a
cluster where they are disabled.  So I agree with Michael B's point of
Issuing an error in such cases.  I think also that this makes handling
for operators easier.

> About tests: I'd run a check on a disabled cluster to check that the command
> fails because disabled.

Makes sense.  Added.  We need a test also for the case of successive
runs with --enable.

Here are also some notes from my side.
- There was no need to complicate the synopsis of the docs.
- usage() included still references to --action and indentation was a
bit too long at the top.
- There were no tests for disabling checksums, so I have added some.
- We should check that the combination of --enable and -r fails.
- Tests use only long options, that's better for readability.
- Improved comments in tests.
- Better to check for "data_checksum_version > 0" if --enable is
used.  That's more portable long-term if more checksum versions are
added.
- The check on postmaster.pid is actually not necessary as we already
know that the cluster has been shutdown cleanly per the state of the
control file.  I agree that there could be a small race condition
here, and we could discuss that in a different thread if need be as
such things could be improved for other frontend tools as well.  For
now I am taking the most simple approach.

(Still need to indent the patches before commit, but that's a nit.)
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi Michael,

Am Montag, den 11.03.2019, 13:53 +0900 schrieb Michael Paquier:
> On Wed, Feb 27, 2019 at 07:59:31AM +0100, Fabien COELHO wrote:
> > Hallo Michael,
> 
> Okay, let's move on with these patches!

Wow cool. I was going to go back to these and split them up similar to
how you did it now that the online verification patch seems to be
done/stuck for v12, but great that you beat me to it.

I had a quick look over the patch and your changes and it LGTM.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Montag, den 11.03.2019, 11:11 +0100 schrieb Michael Banck:
> I had a quick look over the patch and your changes and it LGTM.

One thing: you (Michael) should be co-author for patch #3 as I took some
of your code from https://github.com/michaelpq/pg_plugins/tree/master/pg
_checksums


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Sergei Kornilov
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            tested, passed

Hello

I review latest patchset. I have one big question: Is pg_checksums safe for cross-versions operations? Even with
update_controlfilecall? Currently i am able to enable checksums in pg11 cluster with pg_checksums compiled on HEAD. Is
thisexpected? I didn't notice any version-specific check in code.
 

And few small notes:

> <command>pg_checksums</command> checks, enables or disables data checksums

maybe better is <application>, not <command>?

> +    printf(_("%s enables, disables or verifies data checksums in a PostgreSQL\n"), progname);
> +    printf(_("database cluster.\n\n"));

I doubt this is good line formatting for translation purposes.

> +    printf(_("  -c, --check            check data checksums.  This is the default\n"));
> +    printf(_("                         mode if nothing is specified.\n"));

same. For example pg_basebackup uses different multiline style:

>     printf(_("  -r, --max-rate=RATE    maximum transfer rate to transfer data directory\n"
>             "                         (in kB/s, or use suffix \"k\" or \"M\")\n"));

> +command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
> +          "fails when relfilnodes are requested and action is --disable");

action is "--enable" here ;-)

> if (badblocks > 0)
>    return 1;

Small question: why return 1 instead of exit(1)?

> <refentry id="pgchecksums">
>  <indexterm zone="pgchecksums">

How about use "app-pgchecksums" similar to other applications?

regards, Sergei

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Mon, Mar 11, 2019 at 02:11:11PM +0000, Sergei Kornilov wrote:
> I review latest patchset.

Thanks, I have committed the refactoring of src/common/ as a first
step.

> I have one big question: Is pg_checksums
> safe for cross-versions operations? Even with update_controlfile
> call? Currently i am able to enable checksums in pg11 cluster with
> pg_checksums compiled on HEAD. Is this expected? I didn't notice any
> version-specific check in code.

This depends on the version of the control file, and it happens that
we don't check for it, so that's a good catch from your side.  Not
doing the check is a bad idea as ControlFileData should be compatible
between the binary and the data read.  I am attaching a fresh 0001
which should be back-patched down to v11 as a bug fix.  An advantage
of that, which is similar to pg_rewind, is that if the control file
version does not change in a major version, then the tool can be
used.  And the data folder layer is unlikely going to change..

>> <command>pg_checksums</command> checks, enables or disables data checksums
>
> maybe better is <application>, not <command>?

Fixed, as part of the renaming patch.

>> +    printf(_("%s enables, disables or verifies data checksums in a PostgreSQL\n"), progname);
>> +    printf(_("database cluster.\n\n"));
>
> I doubt this is good line formatting for translation purposes.
>
>> +    printf(_("  -c, --check            check data checksums.  This is the default\n"));
>> +    printf(_("                         mode if nothing is specified.\n"));
>
> same. For example pg_basebackup uses different multiline style:

Oh, good points.  I forgot about that point of view.

>> +command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
>> +          "fails when relfilnodes are requested and action is --disable");
>
> action is "--enable" here ;-)

s/relfilnodes/relfilenodes/ while on it.

>> if (badblocks > 0)gi
>>    return 1;
>
> Small question: why return 1 instead of exit(1)?

OK, let's fix that on the way as part of the renaming.

>> <refentry id="pgchecksums">
>>  <indexterm zone="pgchecksums">
>
> How about use "app-pgchecksums" similar to other applications?

Yes, I was wondering about that one when doing the renaming, but did
not bother much for consistency.  Anyway switched, you are right.

Attached is an updated patch set, minus the refactoring for
src/common/.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Mon, Mar 11, 2019 at 11:19:50AM +0100, Michael Banck wrote:
> One thing: you (Michael) should be co-author for patch #3 as I took some
> of your code from https://github.com/michaelpq/pg_plugins/tree/master/pg
> _checksums

OK, thanks for the notice.  I was not sure as we actually developped
the same fork.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Sergei Kornilov
Дата:
Hi

> Not doing the check is a bad idea as ControlFileData should be compatible
> between the binary and the data read. I am attaching a fresh 0001
> which should be back-patched down to v11 as a bug fix.

Looks fine. We need add few words to documentation?

>>>  if (badblocks > 0)
>>>         return 1;
>>
>>  Small question: why return 1 instead of exit(1)?
>
> OK, let's fix that on the way as part of the renaming.

Was not changed?..

I have no new notes after reading updated patchset.

regards, Sergei


Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,


Am Montag, den 11.03.2019, 14:11 +0000 schrieb Sergei Kornilov:
> > if (badblocks > 0)
> >     return 1;
> 
> Small question: why return 1 instead of exit(1)?

I have a feeling it is project policy to return 0 from main(), and
exit(1) if a program aborts with an error.

In the above case, the program finishes more-or-less as intended (no
abort), but due to errors found on the way, does not return with 0.

I don't mind either way and probably exit(1) makes more sense, but I
wanted to explain why it is like that.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Sergei Kornilov
Дата:
Hello

Thank you for explain. I thought so.

PS: I am not sure for now about patch status in CF app. Did not changed status

regards, Sergei


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Tue, Mar 12, 2019 at 11:13:46AM +0100, Michael Banck wrote:
> I have a feeling it is project policy to return 0 from main(), and
> exit(1) if a program aborts with an error.

Yes, it does not matter much in practice, but other tools just don't
do that.  Note that changing it can be actually annoying for a
backpatch if we don't have the --enable/--disable part, because git is
actually smart enough to detect the file renaming across branches as
far as I tried, but as we are refactoring this code anyway for
--enable and --disable let's just do it, that's cleaner.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël,

Here is a partial review:

> - 0001 if a patch to refactor the routine for the control file
> update.  I have made it backend-aware, and we ought to be careful with
> error handling, use of fds and such, something that v4 was not very
> careful about.

This refactoring patch is ok for me: applies, compiles, check is ok.

However, Am I right in thinking that the change should propagate to other 
tools which manipulate the control file, eg pg_resetwal, postmaster… So 
that there would be only one shared API to update the control file?

> - 0002 renames pg_verify_checksums to pg_checksums with a
> straight-forward switch.  Docs as well as all references to
> pg_verify_checksums are updated.

Looks ok to me. Applies, compiles, checks are ok. Doc build is ok.

I'm wondering whether there should be something done so that the 
inter-release documentation navigation works? Should the section keep the 
former name? Is it managed by hand somewhere else? Maybe it would require 
to keep the refsect1 id, or to duplicate it, not sure.

In "doc/src/sgml/ref/allfiles.sgml" there seems to be a habit to align on 
the SYSTEM keyword, which is not fellowed by the patch.

> - 0003 adds the new options --check, --enable and --disable, with
> --check being the default as discussed.

Looks like the patch I already reviewed, but I'll look at it in details 
later.

"If enabling or disabling checksums, the exit status is nonzero if the 
operation failed."

However:

  +       if (ControlFile->data_checksum_version == 0 &&
  +               action == PG_ACTION_DISABLE)
  +       {
  +               fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
  +               exit(1);
  +       }

This seem contradictory to me: you want to disable checksum, and they are 
already disabled, so nothing is needed. How does that qualifies as a 
"failed" operation?

Further review will come later.

> - 0004 adds a -N/--no-sync which I think is nice for consistency with 
> other tools.  That's also useful for the tests, and was not discussed 
> until now on this thread.

Indeed. I do not immediately see the use case where no syncing would be a 
good idea. I can see why it would be a bad idea. So I'm not sure of the 
concept.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Tue, Mar 12, 2019 at 10:08:19PM +0100, Fabien COELHO wrote:
> This refactoring patch is ok for me: applies, compiles, check is ok.
> However, Am I right in thinking that the change should propagate to other
> tools which manipulate the control file, eg pg_resetwal, postmaster… So that
> there would be only one shared API to update the control file?

Yes, that would be nice, for now I have focused.  For pg_resetwal yes
we could do it easily.  Would you like to send a patch?

> I'm wondering whether there should be something done so that the
> inter-release documentation navigation works? Should the section keep the
> former name? Is it managed by hand somewhere else? Maybe it would require to
> keep the refsect1 id, or to duplicate it, not sure.

When it came to the renaming of pg_receivexlog to pg_receivewal, we
did not actually do anything in the core code, and let the magic
happen on pgsql-www.  I have also pinged pgsql-packagers about the
renaming and it is not really an issue on their side.  So I have
committed the renaming to pg_checksums as well.  So now remains only
the new options.

> In "doc/src/sgml/ref/allfiles.sgml" there seems to be a habit to align on
> the SYSTEM keyword, which is not fellowed by the patch.

Sure.  I sent an updated patch to actually fix that, and also address
a couple of other side things I noticed on the way like the top
refentry in the docs or the header format at the top of
pg_checksums.c as we are on tweaking the area.

> This seem contradictory to me: you want to disable checksum, and they are
> already disabled, so nothing is needed. How does that qualifies as a
> "failed" operation?

If the operation is automated, then a proper reaction can be done if
multiple attempts are done.  Of course, I am fine to tune things one
way or the other depending on the opinion of the crowd here.  From the
opinions gathered, I can see that (Michael * 2) prefer failing with
exit(1), while (Fabien * 1) would like to just do exit(0).

> Further review will come later.

Thanks, Fabien!

> Indeed. I do not immediately see the use case where no syncing would be a
> good idea. I can see why it would be a bad idea. So I'm not sure of the
> concept.

To leverage the buildfarm effort I think this one is worth it.  Or we
finish to fsync the data folder a couple of times, which would make
the small-ish buildfarm machines suffer more than they need.

I am going to send a rebased patch set of the remaining things at the
top of the discussion as well.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Tue, Mar 12, 2019 at 09:44:03PM +0900, Michael Paquier wrote:
> Yes, it does not matter much in practice, but other tools just don't
> do that.  Note that changing it can be actually annoying for a
> backpatch if we don't have the --enable/--disable part, because git is
> actually smart enough to detect the file renaming across branches as
> far as I tried, but as we are refactoring this code anyway for
> --enable and --disable let's just do it, that's cleaner.

Okay, please find attached a rebased patch set.  I have committed 0001
which adds version checks for the control file, and the renaming
part 0002.  What remains now is the addition of the new options, and
--no-sync.  The "return 1" stuff has been switched to exit(1) while on
it, and is part of 0003.

Now the set of patches is:
- 0001, add --enable and --disable.  I have tweaked a bit the patch so
as "action" is replaced by "mode" which is more consistent with other
tools like pg_ctl.  pg_indent was also complaining about one of the
new enum structures.
- 0002, add --no-sync.

Thanks,
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël,

> Yes, that would be nice, for now I have focused.  For pg_resetwal yes
> we could do it easily.  Would you like to send a patch?

I probably can do that before next Monday. I'll prioritize reviewing the 
latest instance of this patch, though.

>> This seem contradictory to me: you want to disable checksum, and they are
>> already disabled, so nothing is needed. How does that qualifies as a
>> "failed" operation?
>
> If the operation is automated, then a proper reaction can be done if
> multiple attempts are done.  Of course, I am fine to tune things one
> way or the other depending on the opinion of the crowd here.  From the
> opinions gathered, I can see that (Michael * 2) prefer failing with
> exit(1), while (Fabien * 1) would like to just do exit(0).

Yep, that sums it up:-).

>> Indeed. I do not immediately see the use case where no syncing would be a
>> good idea. I can see why it would be a bad idea. So I'm not sure of the
>> concept.
>
> To leverage the buildfarm effort I think this one is worth it.  Or we
> finish to fsync the data folder a couple of times, which would make
> the small-ish buildfarm machines suffer more than they need.

Ok for the particular use-case, provided that the documentation is very 
clear about the risks, which is the case, so fine with me wrt to the 
feature.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 13, 2019 at 07:18:32AM +0100, Fabien COELHO wrote:
> I probably can do that before next Monday. I'll prioritize reviewing the
> latest instance of this patch, though.

Thanks.  The core code of the feature has not really changed with the
last reviews, except for the tweaks in the variable names and I think
that it's in a rather committable shape.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Michaël-san,

> Now the set of patches is:
> - 0001, add --enable and --disable.  I have tweaked a bit the patch so
> as "action" is replaced by "mode" which is more consistent with other
> tools like pg_ctl.  pg_indent was also complaining about one of the
> new enum structures.

Patch applies cleanly, compiles, various make check ok, doc build ok.

I'm still at odds with the exit(1) behavior when there is nothing to do.

If this behavior is kept, I think that the documentation needs to be 
improved because "failed" does not describe a no-op-was-needed to me.

"""
If enabling or disabling checksums, the exit status is nonzero if the 
operation failed.
"""

Maybe: "... if the operation failed or the requested setting is already 
active." would at least describe clearly the implemented behavior.

  +       printf(_("  -c, --check            check data checksums\n"));
  +       printf(_("                         This is the default mode if nothing is specified.\n"));

I'm not sure of the punctuation logic on the help line: the first sentence 
does not end with a ".". I could not find an instance of this style in 
other help on pg commands. I'd suggest "check data checksums (default)" 
would work around and be more in line with other commands help.

I see a significant locking issue, which I discussed on other threads 
without convincing anyone. I could do the following things:

I slowed down pg_checksums by adding a 0.1s sleep when scanning a new 
file, then started a "pg_checksums --enable" on a stopped cluster, then 
started the cluster while the enabling was in progress, then connected and 
updated data. Hmmm. Then I stopped while the slow enabling was still in 
progress. Then I could also run a fast pg_checksums --enable in parallel, 
overtaking the first one... then when the fast one finished, I started the 
cluster again. When the slow one finished, it overwrote the control file, 
I had a running cluster with a control file which did not say so, so I 
could disable the checksum. Hmmm again. Ok, I could not generate a 
inconsistent state because on stopping the cluster the cluster file is 
overwritten with the initial state from the point of view of postmater, 
but it does not look good.

I do not think it is a good thing that two commands can write to the data 
directory at the same time, really.

About fsync-ing: ISTM that it is possible that the control file is written 
to disk while data are still not written, so a failure in between would 
leave the cluster with an inconsistent state. I think that it should fsync 
the data *then* update the control file and fsync again on that one.

> - 0002, add --no-sync.

Patch applies cleanly, compiles, various make checks are ok, doc build ok.

Fine with me.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:
> I'm not sure of the punctuation logic on the help line: the first sentence
> does not end with a ".". I could not find an instance of this style in other
> help on pg commands. I'd suggest "check data checksums (default)" would work
> around and be more in line with other commands help.

Good idea, let's do that.

> I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
> then started a "pg_checksums --enable" on a stopped cluster, then started
> the cluster while the enabling was in progress, then connected and updated
> data.

Well, yes, don't do that.  You can get into the same class of problems
while running pg_rewind, pg_basebackup or even pg_resetwal once the
initial control file check is done for each one of these tools.

> I do not think it is a good thing that two commands can write to the data
> directory at the same time, really.

We don't prevent either a pg_resetwal and a pg_basebackup to run in
parallel.  That would be...  Interesting.

> About fsync-ing: ISTM that it is possible that the control file is written
> to disk while data are still not written, so a failure in between would
> leave the cluster with an inconsistent state. I think that it should fsync
> the data *then* update the control file and fsync again on that one.

If --disable is used, the control file gets updated at the end without
doing anything else.  If the host crashes, it could be possible that
the control file has checksums enabled or disabled.  If the state is
disabled, then well it succeeded.  If the state is enabled, then the
control file is still correct, because all the other blocks still have
checksums set.

if --enable is used, we fsync the whole data directory after writing
all the blocks and updating the control file at the end.  The case you
are referring to here is in fsync_pgdata(), not pg_checksums actually,
because you could reach the same state after a simple initdb.  It
could be possible to reach a state where the control file has
checksums enabled and some blocks are not correctly synced, still you
would notice rather quickly if the server is in an incorrect state at
the follow-up startup.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
>> I do not think it is a good thing that two commands can write to the data
>> directory at the same time, really.
>
> We don't prevent either a pg_resetwal and a pg_basebackup to run in
> parallel.  That would be...  Interesting.

Yep, I'm trying again to suggest that this kind of thing should be 
prevented. It seems that I'm pretty unconvincing.

>> About fsync-ing: ISTM that it is possible that the control file is written
>> to disk while data are still not written, so a failure in between would
>> leave the cluster with an inconsistent state. I think that it should fsync
>> the data *then* update the control file and fsync again on that one.
>
> if --enable is used, we fsync the whole data directory after writing
> all the blocks and updating the control file at the end. [...]
> It could be possible to reach a state where the control file has 
> checksums enabled and some blocks are not correctly synced, still you 
> would notice rather quickly if the server is in an incorrect state at 
> the follow-up startup.

Yep. That is the issue I think is preventable by fsyncing updated data 
*then* writing & syncing the control file, and that should be done by
pg_checksums.

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 13, 2019 at 10:44:03AM +0100, Fabien COELHO wrote:
> Yep. That is the issue I think is preventable by fsyncing updated data
> *then* writing & syncing the control file, and that should be done by
> pg_checksums.

Well, pg_rewind works similarly: control file gets updated and then
the whole data directory gets flushed.  In my opinion, the take here
is that we log something after the sync of the whole data folder is
done, so as in the event of a crash an operator can make sure that
everything has happened.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier:
> On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:
> > I'm not sure of the punctuation logic on the help line: the first sentence
> > does not end with a ".". I could not find an instance of this style in other
> > help on pg commands. I'd suggest "check data checksums (default)" would work
> > around and be more in line with other commands help.
> 
> Good idea, let's do that.
> 
> > I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
> > then started a "pg_checksums --enable" on a stopped cluster, then started
> > the cluster while the enabling was in progress, then connected and updated
> > data.
> 
> Well, yes, don't do that.  You can get into the same class of problems
> while running pg_rewind, pg_basebackup or even pg_resetwal once the
> initial control file check is done for each one of these tools.
> 
> > I do not think it is a good thing that two commands can write to the data
> > directory at the same time, really.
> 
> We don't prevent either a pg_resetwal and a pg_basebackup to run in
> parallel.  That would be...  Interesting.

But does pg_basebackup actually change the primary's data directory? I
don't think so, so that does not seem to be a problem.

pg_rewind and pg_resetwal are (TTBOMK) pretty quick operations, while
pg_checksums can potentially run for hours, so I see the point of taking
extra care here.

On the other hand, two pg_checksums running in parallel also seem not
much of a problem as the cluster is offline anyway.

What is much more of a footgun is one DBA starting pg_checksums --enable 
on a 1TB cluster, then going for lunch, and then the other DBA wondering
why the DB is down and starting the instance again.

We read the control file on pg_checksums' startup, so once pg_checksums
finishs it'll write the old checkpoint LSN into pg_control (along with
the updated checksum version). This is pilot error, but I think we
should try to guard against it.

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

Another option would be to add a new feature which reliably blocks an
instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.

> > About fsync-ing: ISTM that it is possible that the control file is written
> > to disk while data are still not written, so a failure in between would
> > leave the cluster with an inconsistent state. I think that it should fsync
> > the data *then* update the control file and fsync again on that one.
> 
> if --enable is used, we fsync the whole data directory after writing
> all the blocks and updating the control file at the end.  The case you
> are referring to here is in fsync_pgdata(), not pg_checksums actually,
> because you could reach the same state after a simple initdb.  

But in the initdb case you don't have any valuable data in the instance
yet.

> It
> could be possible to reach a state where the control file has
> checksums enabled and some blocks are not correctly synced, still you
> would notice rather quickly if the server is in an incorrect state at
> the follow-up startup.

Would you?  I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway. 

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Hello,

>> Yep. That is the issue I think is preventable by fsyncing updated data
>> *then* writing & syncing the control file, and that should be done by
>> pg_checksums.
>
> Well, pg_rewind works similarly: control file gets updated and then
> the whole data directory gets flushed.

So it is basically prone to the same potential issue?

> In my opinion, the take here is that we log something after the sync of 
> the whole data folder is done, so as in the event of a crash an operator 
> can make sure that everything has happened.

I do not understand. I'm basically only suggesting to reorder 3 lines and 
add an fsync so that this potential problem goes away, see attached poc 
(which does not compile because pg_fsync is in the backend only, however 
it works with fsync but on linux, I'm unsure of the portability, 
probably pg_fsync should be moved to port or something).

-- 
Fabien.
Вложения

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:
On Wed, Mar 13, 2019 at 11:41 AM Michael Banck <michael.banck@credativ.de> wrote:
Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier:
> On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:
> > I'm not sure of the punctuation logic on the help line: the first sentence
> > does not end with a ".". I could not find an instance of this style in other
> > help on pg commands. I'd suggest "check data checksums (default)" would work
> > around and be more in line with other commands help.
>
> Good idea, let's do that.
>
> > I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
> > then started a "pg_checksums --enable" on a stopped cluster, then started
> > the cluster while the enabling was in progress, then connected and updated
> > data.
>
> Well, yes, don't do that.  You can get into the same class of problems
> while running pg_rewind, pg_basebackup or even pg_resetwal once the
> initial control file check is done for each one of these tools.
>
> > I do not think it is a good thing that two commands can write to the data
> > directory at the same time, really.
>
> We don't prevent either a pg_resetwal and a pg_basebackup to run in
> parallel.  That would be...  Interesting.

But does pg_basebackup actually change the primary's data directory? I
don't think so, so that does not seem to be a problem.

pg_rewind and pg_resetwal are (TTBOMK) pretty quick operations, while
pg_checksums can potentially run for hours, so I see the point of taking
extra care here.

On the other hand, two pg_checksums running in parallel also seem not
much of a problem as the cluster is offline anyway.

What is much more of a footgun is one DBA starting pg_checksums --enable
on a 1TB cluster, then going for lunch, and then the other DBA wondering
why the DB is down and starting the instance again.

We read the control file on pg_checksums' startup, so once pg_checksums
finishs it'll write the old checkpoint LSN into pg_control (along with
the updated checksum version). This is pilot error, but I think we
should try to guard against it.

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

In (i) you need to also check that is' not offline *again*. Somebody could start *and* stop the database while pg_checksums is running. But that should hopefully be enough to check the time field?


Another option would be to add a new feature which reliably blocks an
instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.

Instead of overloading yet another thing on postmaster.pid, it might be better to just have a separate file that if it exists, blocks startup with a message defined as the content of that file?


> It
> could be possible to reach a state where the control file has
> checksums enabled and some blocks are not correctly synced, still you
> would notice rather quickly if the server is in an incorrect state at
> the follow-up startup.

Would you?  I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway. 

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.

Seems like a good idea -- there certainly could be a substantial delay there depending on data size and underlying storage.
 
--

Re: Offline enabling/disabling of data checksums

От
Sergei Kornilov
Дата:
Hi

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any
issue.I have master with checksums, but replica without.
 
Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.

Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i
thinkwe need recheck BLCKSZ between compiled pg_checksum and used in PGDATA
 

regards, Sergei


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Hallo Michael,

> I propose we re-read the control file for the enable case after we
> finished operating on all files and (i) check the instance is still
> offline and (ii) update the checksums version from there. That should be
> a small but worthwhile change that could be done anyway.

That looks like a simple but mostly effective guard.

> Another option would be to add a new feature which reliably blocks an
> instance from starting up due to maintenance - either a new control file
> field, some message in postmaster.pid (like "pg_checksums maintenance in
> progress") that would prevent pg_ctl or postgres/postmaster from
> starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
> "pg_checksums in progress' or some other trigger file.

I think that a clear cluster-locking can-be-overriden-if-needed 
shared-between-commands mechanism would be a good thing (tm), although it 
requires some work.

My initial suggestion was to update the control file with an appropriate 
state, eg some general "admin command in progress", but I understood that 
it is rejected, and for another of your patch it seems that the 
"postmaster.pid" file is the right approach. Fine with me, the point is 
that it should be effective and consistent accross all relevant commands.

A good point about the "postmaster.pid" trick, when it does not contain 
the posmaster pid, is that overriding is as simple as "rm postmaster.pid".

>> It could be possible to reach a state where the control file has 
>> checksums enabled and some blocks are not correctly synced, still you 
>> would notice rather quickly if the server is in an incorrect state at 
>> the follow-up startup.
>
> Would you?  I think I'm with Fabien on this one and it seems worthwhile
> to run fsync_pgdata() before and after update_controlfile() - the second
> one should be really quick anyway. 

Note that fsync_pgdata is kind of heavy, it recurses everywhere. I think 
that a simple fsync on the control file only is enough.

> Also, I suggest to maybe add a notice in verbose mode that we are
> syncing the data directory - otherwise the user might wonder what's
> going on at 100% done, though I haven't seen a large delay in my tests
> so far.

I agree, as it might not be cheap.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Mittwoch, den 13.03.2019, 11:47 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:41 AM Michael Banck <michael.banck@credativ.de> wrote:
> > I propose we re-read the control file for the enable case after we
> > finished operating on all files and (i) check the instance is still
> > offline and (ii) update the checksums version from there. That should be
> > a small but worthwhile change that could be done anyway.
> 
> In (i) you need to also check that is' not offline *again*. Somebody
> could start *and* stop the database while pg_checksums is running. But
> that should hopefully be enough to check the time field?

Good point.

> > Also, I suggest to maybe add a notice in verbose mode that we are
> > syncing the data directory - otherwise the user might wonder what's
> > going on at 100% done, though I haven't seen a large delay in my tests
> > so far.
> 
> Seems like a good idea -- there certainly could be a substantial delay
> there depending on data size and underlying storage.

The attached patch should do the above, on top of Michael's last
patchset.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Вложения

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:
On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <sk@zsrv.org> wrote:
Hi

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any issue. I have master with checksums, but replica without.
Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.

Enabling or disabling the checksums offline on the master quite clearly requires a rebuild of the standby, there is no other way (this is one of the reasons for the online enabling in that patch, so I still hope we can get that done -- but not for this version).

You would have the same with PITR backups for example. And especially if you have some tool that does block or segment level differential.

Of course, we have to make sure that this actually fails.

I wonder if we should bring out the big hammer and actually change the system id in pg_control when checksums are enabled/disabled by this tool? That should make it clear to any tool that it's changed.


Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i think we need recheck BLCKSZ between compiled pg_checksum and used in PGDATA

You mean if the backend and pg_checksums is built with different blocksize? Yeah, that sounds like something which is a cheap check and should be done. 

--

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> > Also we support ./configure --with-blocksize=(not equals 8)? make
> > check on HEAD fails for me. If we support this - i think we need
> > recheck BLCKSZ between compiled pg_checksum and used in PGDATA
> 
> You mean if the backend and pg_checksums is built with different
> blocksize? Yeah, that sounds like something which is a cheap check and
> should be done. 

I've been doing that in my pg_checksums fork for a while (as it further
removed from the Postgres binaries) but yeah we should check for that as
well in pg_checksums, see attached patch.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Вложения

Re: Offline enabling/disabling of data checksums

От
Sergei Kornilov
Дата:
Hi

>> One new question from me: how about replication?
>> Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without
anyissue. I have master with checksums, but replica without.
 
>> Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.
>
> Enabling or disabling the checksums offline on the master quite clearly requires a rebuild of the standby, there is
noother way (this is one of the reasons for the online enabling in that patch, so I still hope we can get that done --
butnot for this version).
 

I mean this should be at least documented.
Change system id... Maybe is reasonable

>> Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i
thinkwe need recheck BLCKSZ between compiled pg_checksum and used in PGDATA
 
>
> You mean if the backend and pg_checksums is built with different blocksize? Yeah, that sounds like something which is
acheap check and should be done.
 

Yep

regards, Sergei


Re: Offline enabling/disabling of data checksums

От
Sergei Kornilov
Дата:
Hi,

>>  > Also we support ./configure --with-blocksize=(not equals 8)? make
>>  > check on HEAD fails for me. If we support this - i think we need
>>  > recheck BLCKSZ between compiled pg_checksum and used in PGDATA
>>
>>  You mean if the backend and pg_checksums is built with different
>>  blocksize? Yeah, that sounds like something which is a cheap check and
>>  should be done.
>
> I've been doing that in my pg_checksums fork for a while (as it further
> removed from the Postgres binaries) but yeah we should check for that as
> well in pg_checksums, see attached patch.

Seems good. And I think we need backpath this check to pg11. similar to cross-version compatibility checks

regards, Sergei


Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:
On Wed, Mar 13, 2019 at 12:40 PM Sergei Kornilov <sk@zsrv.org> wrote:
Hi

>> One new question from me: how about replication?
>> Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any issue. I have master with checksums, but replica without.
>> Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.
>
> Enabling or disabling the checksums offline on the master quite clearly requires a rebuild of the standby, there is no other way (this is one of the reasons for the online enabling in that patch, so I still hope we can get that done -- but not for this version).

I mean this should be at least documented.
Change system id... Maybe is reasonable

I think this is dangerous enough that it needs to be enforced and not documented.

Most people who care about checksums are also going to be having either replication or backup...


>> Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i think we need recheck BLCKSZ between compiled pg_checksum and used in PGDATA
>
> You mean if the backend and pg_checksums is built with different blocksize? Yeah, that sounds like something which is a cheap check and should be done.

Yep

This one I could more live with it only being a documented problem rather than enforced, but it also seems very simple to enforce.
 
--

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:
> The attached patch should do the above, on top of Michael's last
> patchset.

What you are doing here looks like a good defense in itself.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote:
> Seems good. And I think we need backpath this check to pg11. similar
> to cross-version compatibility checks

Good point raised, a backpatch looks adapted.  It would be nice to get
into something more dynamic, but pg_checksum_block() uses directly
BLCKSZ :(
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <sk@zsrv.org> wrote:
> > One new question from me: how about replication?
> > Case: primary+replica, we shut down primary and enable checksum, and
> > "started streaming WAL from primary" without any issue. I have
> > master with checksums, but replica without.
> > Or cluster with checksums, then disable checksums on primary, but
> > standby think we have checksums.
> 
> Enabling or disabling the checksums offline on the master quite
> clearly requires a rebuild of the standby, there is no other way

What about shutting down both and running pg_checksums --enable on the
standby as well?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:
> I think this is dangerous enough that it needs to be enforced and not
> documented.

Changing the cluster ID might have some other side-effects, I think
there are several cloud-native 3rd party solutions that use the cluster
ID as some kind of unique identifier for an instance. It might not be an
issue in practise, but then again, it might break other stuff down the
road.

Another possibility would be to extend the replication protocol's 
IDENTIFY_SYSTEM command to also report the checksum version so that the
standby can check against the local control file on startup. But I am
not sure we can easily extend IDENTIFY_SYSTEM this way nor whether we
should for this rather corner-casey thing?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:
On Wed, Mar 13, 2019 at 4:46 PM Michael Banck <michael.banck@credativ.de> wrote:
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <sk@zsrv.org> wrote:
> > One new question from me: how about replication?
> > Case: primary+replica, we shut down primary and enable checksum, and
> > "started streaming WAL from primary" without any issue. I have
> > master with checksums, but replica without.
> > Or cluster with checksums, then disable checksums on primary, but
> > standby think we have checksums.
>
> Enabling or disabling the checksums offline on the master quite
> clearly requires a rebuild of the standby, there is no other way

What about shutting down both and running pg_checksums --enable on the
standby as well?

That sounds pretty fragile to me. But if we can prove that the user has done things in the right order, sure. But how can we do that in an offline process? what if the user just quickly restarted the primary note after the standby had been shut down? We'll need to somehow validate it across the nodes.. 
--

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:
On Wed, Mar 13, 2019 at 4:51 PM Michael Banck <michael.banck@credativ.de> wrote:
Hi,

Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:
> I think this is dangerous enough that it needs to be enforced and not
> documented.

Changing the cluster ID might have some other side-effects, I think
there are several cloud-native 3rd party solutions that use the cluster
ID as some kind of unique identifier for an instance. It might not be an
issue in practise, but then again, it might break other stuff down the
road.

Well, whatever we do they have to update, right? If we're not changing it, then we're basically saying that it's (systemid, checksums) that is the identifier of the cluster, not just systemid. They'd have to go around and check each node individually for the configuration and not just use systemid anyway, so what's the actual win?


Another possibility would be to extend the replication protocol's
IDENTIFY_SYSTEM command to also report the checksum version so that the
standby can check against the local control file on startup. But I am
not sure we can easily extend IDENTIFY_SYSTEM this way nor whether we
should for this rather corner-casey thing?

We could, but is it really a win in those scenarios? Vs just making the systemid different? With systemid being different it's obvious that something needs to be done. If it's not then at the best, if we check it in the standby startup, the standby won't start. But people can still end up with things like unusuable/corrupt backups for example. 

--

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote:
> Seems good. And I think we need backpath this check to pg11. similar
> to cross-version compatibility checks

+    fprintf(stderr, _("%s: data directory block size %d is different to compiled-in block size %d.\n"),
+            progname, ControlFile->blcksz, BLCKSZ);
The error message looks grammatically a bit weird to me.  What about
the following?  Say:
"database block size of %u is different from supported block size of
%u."
Better ideas are welcome.

Please note that hose integers are unsigned by the way.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 13, 2019 at 12:24:21PM +0100, Magnus Hagander wrote:
> Enabling or disabling the checksums offline on the master quite clearly
> requires a rebuild of the standby, there is no other way (this is one of
> the reasons for the online enabling in that patch, so I still hope we can
> get that done -- but not for this version).

I am curious to understand why this would require a rebuild of the
standby.  Technically FPWs don't update the checksum of a page when it
is WAL-logged, so even if a primary and a standby don't agree on the
checksum configuration, it is the timing where pages are flushed in
the local instance which counts for checksum correctness.

> You mean if the backend and pg_checksums is built with different blocksize?
> Yeah, that sounds like something which is a cheap check and should be done.

Yes, we should check after that, checksum calculation uses BLCKSZ with
a hardcoded value, so a mismatch would cause computation failures.  It
could be possible to not have this restriction if we made the block
size an argument of the checksum calculation though.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote:
> On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:
> > The attached patch should do the above, on top of Michael's last
> > patchset.
>
> What you are doing here looks like a good defense in itself.

More thoughts on that, and here is a short summary of the thread.

+       /* Check if control file has changed */
+       if (controlfile_last_updated != ControlFile->time)
+       {
+           fprintf(stderr, _("%s: control file has changed since startup\n"), progname);
+           exit(1);
+       }
Actually, under the conditions discussed on this thread that Postgres
is started in parallel of pg_checksums, imagine the following
scenario:
- pg_checksums starts to enable checksums, it reads a block to
calculate its checksum, then calculates it.
- Postgres has been started in parallel, writes the same block.
- pg_checksums finishes the block calculation, writes back the block
it has just read.
- Postgres stops, some data is lost.
- At the end of pg_checksums, we complain that the control file has
been updated since the start of pg_checksums.
I think that we should be way more noisy about this error message
document properly that Postgres should not be started while checksums
are enabled.  Basically, I guess that it should mention that there is
a risk of corruption because of this parallel operation.

Hence, based on what I could read on this thread, we'd like to have
the following things added to the core patch set:
1) When enabling checksums, fsync the data folder.  Then update the
control file, and finally fsync the control file (+ flush of the
parent folder to make the whole durable).  This way a host crash never
actually means that we finish in an inconsistent state.
2) When checksums are disabled, update the control file, then fsync
it + its parent folder.
3) Add tracking of the control file data, and complain loudly before
trying to update the file if there are any inconsistencies found.
4) Document with a big-fat-red warning that postgres should not be
worked on while the tool is enabling or disabling checksums.

There is a part discussed about standbys and primaries with not the
same checksum settings, but I commented on that in [1].

There is a secondary bug fix to prevent the tool to run if the data
folder has been initialized with a block size different than what
pg_checksums has been compiled with in [2].  The patch looks good,
still the error message could be better per my lookup.

[1]: https://www.postgresql.org/message-id/20190314002342.GC3493@paquier.xyz
[2]: https://www.postgresql.org/message-id/20190313224742.GA3493@paquier.xyz

Am I missing something?
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Mittwoch, den 13.03.2019, 17:54 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 4:51 PM Michael Banck <michael.banck@credativ.de> wrote:
> > Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:
> > > I think this is dangerous enough that it needs to be enforced and not
> > > documented.
> > 
> > Changing the cluster ID might have some other side-effects, I think
> > there are several cloud-native 3rd party solutions that use the cluster
> > ID as some kind of unique identifier for an instance. It might not be an
> > issue in practise, but then again, it might break other stuff down the
> > road.
> 
> Well, whatever we do they have to update, right? 

Yeah, but I am saying their orchestrators might get confused about where
the old instance went and what this new thing with a totally different
systemid is and lose the connection between the two. 

Maybe that is a feature and not a bug.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:


On Thu, Mar 14, 2019 at 1:23 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Mar 13, 2019 at 12:24:21PM +0100, Magnus Hagander wrote:
> Enabling or disabling the checksums offline on the master quite clearly
> requires a rebuild of the standby, there is no other way (this is one of
> the reasons for the online enabling in that patch, so I still hope we can
> get that done -- but not for this version).

I am curious to understand why this would require a rebuild of the
standby.  Technically FPWs don't update the checksum of a page when it
is WAL-logged, so even if a primary and a standby don't agree on the
checksum configuration, it is the timing where pages are flushed in
the local instance which counts for checksum correctness.

Are you suggesting we should support running with a master with checksums on and a standby with checksums off in the same cluster? That seems.. Very fragile.

--

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:


On Thu, Mar 14, 2019 at 5:39 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote:
> On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:
> > The attached patch should do the above, on top of Michael's last
> > patchset.
>
> What you are doing here looks like a good defense in itself.

More thoughts on that, and here is a short summary of the thread.

+       /* Check if control file has changed */
+       if (controlfile_last_updated != ControlFile->time)
+       {
+           fprintf(stderr, _("%s: control file has changed since startup\n"), progname);
+           exit(1);
+       }
Actually, under the conditions discussed on this thread that Postgres
is started in parallel of pg_checksums, imagine the following
scenario:
- pg_checksums starts to enable checksums, it reads a block to
calculate its checksum, then calculates it.
- Postgres has been started in parallel, writes the same block.
- pg_checksums finishes the block calculation, writes back the block
it has just read.
- Postgres stops, some data is lost.
- At the end of pg_checksums, we complain that the control file has
been updated since the start of pg_checksums.
I think that we should be way more noisy about this error message
document properly that Postgres should not be started while checksums
are enabled.  Basically, I guess that it should mention that there is
a risk of corruption because of this parallel operation.

Hence, based on what I could read on this thread, we'd like to have
the following things added to the core patch set:
1) When enabling checksums, fsync the data folder.  Then update the
control file, and finally fsync the control file (+ flush of the
parent folder to make the whole durable).  This way a host crash never
actually means that we finish in an inconsistent state.
2) When checksums are disabled, update the control file, then fsync
it + its parent folder.
3) Add tracking of the control file data, and complain loudly before
trying to update the file if there are any inconsistencies found.
4) Document with a big-fat-red warning that postgres should not be
worked on while the tool is enabling or disabling checksums.

Given that the failure is data corruption, I don't think big fat warning is enough. We should really make it impossible to start up the postmaster by mistake during the checksum generation. People don't read the documentation until it's too late. And it might not even be under their control - some automated tool might go in and try to start postgres, and boom, corruption.

One big-hammer method could be similar to what pg_upgrade does -- temporarily rename away the controlfile so postgresql can't start, and when done, put it back.

--

Re: Offline enabling/disabling of data checksums

От
Christoph Berg
Дата:
Re: Magnus Hagander 2019-03-14 <CABUevEx7QZLOjWDvwTdm1VM+mjsDm7=ZmB8qck7nDmcHEY5O5g@mail.gmail.com>
> Are you suggesting we should support running with a master with checksums
> on and a standby with checksums off in the same cluster? That seems.. Very
> fragile.

The case "shut down master and standby, run pg_checksums on both, and
start them again" should be supported. That seems safe to do, and a
real-world use case.

Changing the system id to a random number would complicate this.

(Horrible idea: maybe just adding 1 (= checksum version) to the system
id would work?)

Christoph


Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:


On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg <myon@debian.org> wrote:
Re: Magnus Hagander 2019-03-14 <CABUevEx7QZLOjWDvwTdm1VM+mjsDm7=ZmB8qck7nDmcHEY5O5g@mail.gmail.com>
> Are you suggesting we should support running with a master with checksums
> on and a standby with checksums off in the same cluster? That seems.. Very
> fragile.

The case "shut down master and standby, run pg_checksums on both, and
start them again" should be supported. That seems safe to do, and a
real-world use case.

I can agree with that, if we can declare it safe. You might need some way to ensure it was shut down cleanly on both sides, I'm guessing. 


Changing the system id to a random number would complicate this.

(Horrible idea: maybe just adding 1 (= checksum version) to the system
id would work?)

Or any other way of changing the systemid in a predictable way would also work, right? As long as it's done the same on both sides. And that way it would look different to any system that *doesn't* know what it means, which is probably a good thing.

--

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
> Given that the failure is data corruption, I don't think big fat
> warning is enough. We should really make it impossible to start up the
> postmaster by mistake during the checksum generation. People don't
> read the documentation until it's too late. And it might not even be
> under their control - some automated tool might go in and try to start
> postgres, and boom, corruption.

I guess you're right.

> One big-hammer method could be similar to what pg_upgrade does --
> temporarily rename away the controlfile so postgresql can't start, and
> when done, put it back.

That sounds like a good solution to me. I've made PoC patch for that,
see attached.

The only question is whether pg_checksums should try to move pg_control
back (i) on failure (ii) when interrupted?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander:
> On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg <myon@debian.org> wrote:
> > Re: Magnus Hagander 2019-03-14 <CABUevEx7QZLOjWDvwTdm1VM+mjsDm7=ZmB8qck7nDmcHEY5O5g@mail.gmail.com>
> > > Are you suggesting we should support running with a master with checksums
> > > on and a standby with checksums off in the same cluster? That seems.. Very
> > > fragile.
> > 
> > The case "shut down master and standby, run pg_checksums on both, and
> > start them again" should be supported. That seems safe to do, and a
> > real-world use case.
> 
> I can agree with that, if we can declare it safe. You might need some
> way to ensure it was shut down cleanly on both sides, I'm guessing. 
> 
> > Changing the system id to a random number would complicate this.
> > 
> > (Horrible idea: maybe just adding 1 (= checksum version) to the system
> > id would work?)
> 
> Or any other way of changing the systemid in a predictable way would
> also work, right? As long as it's done the same on both sides. And
> that way it would look different to any system that *doesn't* know
> what it means, which is probably a good thing.

If we change the system identifier, we'll have to reset the WAL as well
or otherwise we'll get "PANIC:  could not locate a valid checkpoint
record" on startup.  So even if we do it predictably on both primary and
standby I guess the standby would need to be re-cloned?

So I think an option that skips that for people who know what they are
doing with the streaming replication setup would be required, should we
decide to bump the system identifier.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote:
> Are you suggesting we should support running with a master with checksums
> on and a standby with checksums off in the same cluster? That seems.. Very
> fragile.

Well, saying that it is supported is a too big term for that.  What I
am saying is that the problems you are pointing out are not as bad as
you seem to mean they are as long as an operator does not copy on-disk
pages from one node to the other one.  Knowing that checksums apply
only to pages flushed on disk on a local node, everything going
through WAL for availability is actually able to work fine:
- PITR
- archive recovery.
- streaming replication.
Reading the code I understand that.  I have as well done some tests
with a primary/standby configuration to convince myself, using pgbench
on both nodes (read-write for the primary, read-only on the standby),
with checkpoint (or restart point) triggered on each node every 20s.
If one node has checksum enabled and the other checksum disabled, then
I am not seeing any inconsistency.

However, anything which does a physical copy of pages could get things
easily messed up if one node has checksum disabled and the other
enabled.  One such tool is pg_rewind.  If the promoted standby has
checksums disabled (becoming the source), and the old master to rewind
has checksums enabled, then the rewind could likely copy pages which
have not their checksums set correctly, resulting in incorrect
checksums on the old master.

So yes, it is easy to mess up things, however this does not apply to
all configurations.  The suggestion from Christoph to enable checksums
on both nodes separately would work, and personally I find the
suggestion to update the system ID after enabling or disabling
checksums an over-engineered design because of the reasons in the
first part of this email (it is technically doable to enable checksums
with a minimum downtime and a failover), so my recommendation would be
to document that when enabling checksums on one instance in a cluster,
it should be applied to all instances as it could cause problems with
any tools performing a physical copy of relation files or blocks.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Thu, Mar 14, 2019 at 04:26:20PM +0100, Michael Banck wrote:
> Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
>> One big-hammer method could be similar to what pg_upgrade does --
>> temporarily rename away the controlfile so postgresql can't start, and
>> when done, put it back.
>
> That sounds like a good solution to me. I've made PoC patch for that,
> see attached.

Indeed.  I did not know this trick from pg_upgrade.  We could just use
the same.

> The only question is whether pg_checksums should try to move pg_control
> back (i) on failure (ii) when interrupted?

Yes, we should have a callback on SIGINT and SIGTERM here which just
moves back in place the control file if the temporary one exists.  I
have been able to grab some time to incorporate the feedback gathered
on this thread, and please find attached a new version of the patch to
add --enable/--disable.  The main changes are:
- When enabling checksums, fsync first the data directory, and at the
end then update/flush the control file and its parent folder as Fabien
has reported.
- When disabling checksums, only work on the control file, as Fabien
has also reported.
- Rename the control file when beginning the enabling operation, with
a callback to rename the file back if the operation is interrupted.

Does this make sense?
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 15, 2019 at 11:50:27AM +0900, Michael Paquier wrote:
> - Rename the control file when beginning the enabling operation, with
> a callback to rename the file back if the operation is interrupted.
>
> Does this make sense?

Just before I forget...  Please note that this handles interruptions
but not failures, based on the assumption that on failures we can know
that the system was working on its checksums thanks to the temporary
control file so that's useful for debugging in my opinion.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier:
> On Thu, Mar 14, 2019 at 04:26:20PM +0100, Michael Banck wrote:
> > Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
> > > One big-hammer method could be similar to what pg_upgrade does --
> > > temporarily rename away the controlfile so postgresql can't start, and
> > > when done, put it back.
> > 
> > That sounds like a good solution to me. I've made PoC patch for that,
> > see attached.
> 
> Indeed.  I did not know this trick from pg_upgrade.  We could just use
> the same.
> 
> > The only question is whether pg_checksums should try to move pg_control
> > back (i) on failure (ii) when interrupted?
> 
> Yes, we should have a callback on SIGINT and SIGTERM here which just
> moves back in place the control file if the temporary one exists.  I
> have been able to grab some time to incorporate the feedback gathered
> on this thread, and please find attached a new version of the patch to
> add --enable/--disable.  

Thanks!

One thing stood out to me while quickly looking over it:

+        /*
+         * Flush the control file and its parent path to make the change
+         * durable.
+         */
+        if (fsync_fname(controlfile_path, false, progname) != 0 ||
+            fsync_parent_path(controlfile_path, progname) != 0)
+        {
+            /* errors are already logged on failure */
+            exit(1);
+        }

ISTM this would not run fsync_parent_path() unless the first fsync fails
which is not the intended use. I guess we need two ifs here?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 15, 2019 at 09:04:51AM +0100, Michael Banck wrote:
> ISTM this would not run fsync_parent_path() unless the first fsync fails
> which is not the intended use. I guess we need two ifs here?

Yes, let's do that.  Let's see if others have input to offer about the
patch.  This thread is gathering attention, which is good.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:
On Fri, Mar 15, 2019 at 1:49 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote:
> Are you suggesting we should support running with a master with checksums
> on and a standby with checksums off in the same cluster? That seems.. Very
> fragile.

Well, saying that it is supported is a too big term for that.  What I
am saying is that the problems you are pointing out are not as bad as
you seem to mean they are as long as an operator does not copy on-disk
pages from one node to the other one.  Knowing that checksums apply
only to pages flushed on disk on a local node, everything going
through WAL for availability is actually able to work fine:
- PITR
- archive recovery.
- streaming replication.
Reading the code I understand that.  I have as well done some tests
with a primary/standby configuration to convince myself, using pgbench
on both nodes (read-write for the primary, read-only on the standby),
with checkpoint (or restart point) triggered on each node every 20s.
If one node has checksum enabled and the other checksum disabled, then
I am not seeing any inconsistency.

However, anything which does a physical copy of pages could get things
easily messed up if one node has checksum disabled and the other
enabled.  One such tool is pg_rewind.  If the promoted standby has
checksums disabled (becoming the source), and the old master to rewind
has checksums enabled, then the rewind could likely copy pages which
have not their checksums set correctly, resulting in incorrect
checksums on the old master.

So yes, it is easy to mess up things, however this does not apply to
all configurations.  The suggestion from Christoph to enable checksums
on both nodes separately would work, and personally I find the
suggestion to update the system ID after enabling or disabling
checksums an over-engineered design because of the reasons in the
first part of this email (it is technically doable to enable checksums
with a minimum downtime and a failover), so my recommendation would be
to document that when enabling checksums on one instance in a cluster,
it should be applied to all instances as it could cause problems with
any tools performing a physical copy of relation files or blocks.

As I said, that's a big hammer. I'm all for having a better solution. But I don't think it's acceptable not to have *any* defense against it, given how bad corruption it can lead to.

--

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:
On Thu, Mar 14, 2019 at 4:54 PM Michael Banck <michael.banck@credativ.de> wrote:
Hi,

Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander:
> On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg <myon@debian.org> wrote:
> > Re: Magnus Hagander 2019-03-14 <CABUevEx7QZLOjWDvwTdm1VM+mjsDm7=ZmB8qck7nDmcHEY5O5g@mail.gmail.com>
> > > Are you suggesting we should support running with a master with checksums
> > > on and a standby with checksums off in the same cluster? That seems.. Very
> > > fragile.
> >
> > The case "shut down master and standby, run pg_checksums on both, and
> > start them again" should be supported. That seems safe to do, and a
> > real-world use case.
>
> I can agree with that, if we can declare it safe. You might need some
> way to ensure it was shut down cleanly on both sides, I'm guessing. 
>
> > Changing the system id to a random number would complicate this.
> >
> > (Horrible idea: maybe just adding 1 (= checksum version) to the system
> > id would work?)
>
> Or any other way of changing the systemid in a predictable way would
> also work, right? As long as it's done the same on both sides. And
> that way it would look different to any system that *doesn't* know
> what it means, which is probably a good thing.

If we change the system identifier, we'll have to reset the WAL as well
or otherwise we'll get "PANIC:  could not locate a valid checkpoint
record" on startup.  So even if we do it predictably on both primary and
standby I guess the standby would need to be re-cloned?

So I think an option that skips that for people who know what they are
doing with the streaming replication setup would be required, should we
decide to bump the system identifier.

Ugh. I did not think of that one. But yes, the main idea there would be that if you turn on checksums on the primary then you have to re-clone all standbys. That's what happens if we change the system idenfier -- that's why it's the "big hammer method".
 
But yeah, an option to avoid it could be one way to deal with it. If we could find some safer way to handle it that'd be better, but otherwise changing the sysid by default and having an option to turn it off could be one way to deal with it.

--

Re: Offline enabling/disabling of data checksums

От
Magnus Hagander
Дата:
On Thu, Mar 14, 2019 at 4:26 PM Michael Banck <michael.banck@credativ.de> wrote:
Hi,

Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
> Given that the failure is data corruption, I don't think big fat
> warning is enough. We should really make it impossible to start up the
> postmaster by mistake during the checksum generation. People don't
> read the documentation until it's too late. And it might not even be
> under their control - some automated tool might go in and try to start
> postgres, and boom, corruption.

I guess you're right.

> One big-hammer method could be similar to what pg_upgrade does --
> temporarily rename away the controlfile so postgresql can't start, and
> when done, put it back.

That sounds like a good solution to me. I've made PoC patch for that,
see attached.

The downside with this method is we can't get a nice error message during the attempted startup. But it should at least be safe, which is the most important part. And at least it's clear what's happening once you list the files and see the name of the temporary one.

//Magnus

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 15, 2019 at 09:52:11AM +0100, Magnus Hagander wrote:
> As I said, that's a big hammer. I'm all for having a better solution. But I
> don't think it's acceptable not to have *any* defense against it, given how
> bad corruption it can lead to.

Hm...  It looks that my arguments are not convincing enough.  I am not
really convinced that there is any need to make that the default, nor
does it make much sense to embed that stuff directly into pg_checksums
because that's actually just doing an extra step which is equivalent
to calling pg_resetwal, and we know that this tool has the awesome
reputation to cause more harm than anything else.  At least I would
like to have an option which allows to support the behavior to *not*
update the system identifier so as the cases I mentioned would be
supported, because then it becomes possible to enable checksums on a
primary with only a failover as long as page copies are not directly
involved and that all operations go through WAL.  And that would be
quite nice.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <sk@zsrv.org> wrote:
> > One new question from me: how about replication?
> > Case: primary+replica, we shut down primary and enable checksum, and
> > "started streaming WAL from primary" without any issue. I have
> > master with checksums, but replica without.
> > Or cluster with checksums, then disable checksums on primary, but
> > standby think we have checksums.
> 
> Enabling or disabling the checksums offline on the master quite
> clearly requires a rebuild of the standby, there is no other way (this
> is one of the reasons for the online enabling in that patch, so I
> still hope we can get that done -- but not for this version).
> 
> You would have the same with PITR backups for example. 

I'd like to get back to PITR. 

I thought about this a bit and actually I think PITR might be fine in
the sense that if you enabled or disabled the cluster after the last
basebackup and then do PITR with the avaiable WAL beyond that, you would
get a working cluster, just with the checksum state the cluster had at
the time of the basebackup. I think that would be entirely accetable, so
long as nothing else breaks?

I made some quick tests and did see no errors, but maybe I am missing
something?
 
> And especially if you have some tool that does block or segment level
> differential.

This might be the case, but not sure if PostgreSQL core must worry about
it? Obviously the documentation must be made explicit about these kinds
of cases.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier:
> I have been able to grab some time to incorporate the feedback gathered
> on this thread, and please find attached a new version of the patch to
> add --enable/--disable. 

Some more feedback:

1. There's a typo in line 578 which makes it fail to compile:

|src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use in this function)
|   }y

2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
with the controlfile_path?

3. There's (I think) leftover debug output in the following places:

|+        printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
|+                controlfile_path_temp);

|+        printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
|+                controlfile_path);

|+        printf(_("Syncing data folder\n"));

(that one is debatable, we are mentioning this only in verbose mode in
pg_basebackup but pg_checksums is more chatty anyway, so probably fine).

|+        printf(_("Updating control file\n"));

Besides to the syncing message (which is user-relevant cause they might
wonder what is taking so long), the others seem to be implementation
details we don't need to tell the user about.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote:
> 1. There's a typo in line 578 which makes it fail to compile:
>
> |src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use in this function)
> |   }y

I am wondering where you got this one.  My local branch does not have
it, and the patch I sent does not seem to have it either.

> 2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
> with the controlfile_path?

PG_MODE_DISABLE needs controlfile_path as well.  We could make the
cleanup only available when using --enable, the code just looked more
simple in its current shape.  I think it's just more simple to set
everything unconditionally.  This code may become more complicated in
the future.

> 3. There's (I think) leftover debug output in the following places:
>
> |+        printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
> |+                controlfile_path_temp);
>
> |+        printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
> |+                controlfile_path);
>
> |+        printf(_("Syncing data folder\n"));
>
> (that one is debatable, we are mentioning this only in verbose mode in
> pg_basebackup but pg_checksums is more chatty anyway, so probably
> fine).

This is wanted.  Many folks have been complaning on this thread about
crashes and such, surely we want logs about what happens :)

> |+        printf(_("Updating control file\n"));
>
> Besides to the syncing message (which is user-relevant cause they might
> wonder what is taking so long), the others seem to be implementation
> details we don't need to tell the user about.

Perhaps having them under --verbose makes more sense?
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier:
> On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote:
> > 1. There's a typo in line 578 which makes it fail to compile:
> > 
> > > src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use in this function)
> > >   }y
> 
> I am wondering where you got this one.  My local branch does not have
> it, and the patch I sent does not seem to have it either.

Mea culpa, I must've fat fingered something in the editor before
applying your patch, sorry. I should've double-checked.

> > 2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
> > with the controlfile_path?
> 
> PG_MODE_DISABLE needs controlfile_path as well.  We could make the
> cleanup only available when using --enable, the code just looked more
> simple in its current shape.  I think it's just more simple to set
> everything unconditionally.  This code may become more complicated in
> the future.

Ok.

> > 3. There's (I think) leftover debug output in the following places:
> > 
> > > +        printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
> > > +                controlfile_path_temp);
> > > +        printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
> > > +                controlfile_path);
> > > +        printf(_("Syncing data folder\n"));
> > 
> > (that one is debatable, we are mentioning this only in verbose mode in
> > pg_basebackup but pg_checksums is more chatty anyway, so probably
> > fine).
> 
> This is wanted.  Many folks have been complaning on this thread about
> crashes and such, surely we want logs about what happens :)
> 
> > > +        printf(_("Updating control file\n"));
> > 
> > Besides to the syncing message (which is user-relevant cause they might
> > wonder what is taking so long), the others seem to be implementation
> > details we don't need to tell the user about.
> 
> Perhaps having them under --verbose makes more sense?

Well if we think it is essential in order to tell the user what happened
in the case of an error, it shouldn't be verbose I guess.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël-san,

> Yes, that would be nice, for now I have focused.  For pg_resetwal yes
> we could do it easily.  Would you like to send a patch?

Here is a proposal for "pg_resetwal".

The implementation basically removes a lot of copy paste and calls the 
new update_controlfile function instead. I like removing useless code:-)

The reserwal implementation was doing a rm/create cycle, which was leaving 
a small window for losing the controlfile. Not neat.

I do not see the value of *not* fsyncing the control file when writing it, 
as it is by definition very precious, so I added a fsync. The server side 
branch uses the backend available "pg_fsync", which complies with server 
settings there and can do nothing if fsync is disabled.

Maybe the two changes could be committed separately.

-- 
Fabien.
Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Sun, Mar 17, 2019 at 10:01:20AM +0100, Fabien COELHO wrote:
> The implementation basically removes a lot of copy paste and calls the new
> update_controlfile function instead. I like removing useless code:-)

Yes, I spent something like 10 minutes looking at that code yesterday
and I agree that removing the control file to recreate it is not
really necessary, even if the window between its removal and
recreation is short.

> I do not see the value of *not* fsyncing the control file when writing it,
> as it is by definition very precious, so I added a fsync. The server side
> branch uses the backend available "pg_fsync", which complies with server
> settings there and can do nothing if fsync is disabled.

The issue here is that trying to embed directly the fsync routines
from file_utils.c into pg_resetwal.c messes up the inclusions because
pg_resetwal.c includes backend-side includes, which themselves touch
fd.h :(

In short your approach avoids some extra mess with the include
dependencies. .

> Maybe the two changes could be committed separately.

I was thinking about this one, and for pg_rewind we don't care about
the fsync of the control file because the full data folder gets
fsync'd afterwards and in the event of a crash in the middle of a
rewind the target data folder is surely not something to use, but we
do for pg_checksums, and we do for pg_resetwal.  Even if there is the
argument that usually callers of update_controlfile() would care a
lot about the control file and fsync it, I think that we need some
control on if we do the fsync or not because many tools have a
--no-sync and that should be fully respected.  So while your patch is
on a good track, I would suggest to do the following things to
complete it:
- Add an extra argument bits16 to update_controlfile to pass a set of
optional flags, with NOSYNC being the only and current value.  The
default is to flush the file.
- Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.
- And then delete UpdateControlFile() in xlog.c, and use
update_controlfile() instead to remove even more code.  The version in
xlog.c uses BasicOpenFile(), so we should use also that in
update_controlfile() instead of OpenTransientFile().  As any errors
result in a PANIC we don't care about leaking fds.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Michaël-san,

> The issue here is that trying to embed directly the fsync routines
> from file_utils.c into pg_resetwal.c messes up the inclusions because
> pg_resetwal.c includes backend-side includes, which themselves touch
> fd.h :(
>
> In short your approach avoids some extra mess with the include
> dependencies. .

I could remove the two "catalog/" includes from pg_resetwal, I assume that 
you meant these ones.

>> Maybe the two changes could be committed separately.
>
> I was thinking about this one, and for pg_rewind we don't care about
> the fsync of the control file because the full data folder gets
> fsync'd afterwards and in the event of a crash in the middle of a
> rewind the target data folder is surely not something to use, but we
> do for pg_checksums, and we do for pg_resetwal.  Even if there is the
> argument that usually callers of update_controlfile() would care a
> lot about the control file and fsync it, I think that we need some
> control on if we do the fsync or not because many tools have a
> --no-sync and that should be fully respected.

> So while your patch is on a good track, I would suggest to do the 
> following things to complete it:

> - Add an extra argument bits16 to update_controlfile to pass a set of 
> optional flags, with NOSYNC being the only and current value.  The 
> default is to flush the file.

Hmmm. I just did that, but what about just a boolean? What other options 
could be required? Maybe some locking/checking?

> - Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
> WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.

Done.

> - And then delete UpdateControlFile() in xlog.c, and use
> update_controlfile() instead to remove even more code.

I was keeping that one for another patch because it touches the backend 
code, but it makes sense to do that in one go for consistency.

I kept the initial no-parameter function which calls the new one with 4 
parameters, though, because it looks more homogeneous this way in the 
backend code. This is debatable.

> The version in xlog.c uses BasicOpenFile(), so we should use also that 
> in update_controlfile() instead of OpenTransientFile(). As any errors 
> result in a PANIC we don't care about leaking fds.

Done.

Attached is an update.

-- 
Fabien.
Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:
> I could remove the two "catalog/" includes from pg_resetwal, I assume that
> you meant these ones.

Not exactly.  What I meant is that if you try to call directly
fsync_fname and fsync_parent_path from file_utils.h, then you get into
trouble because of xlog.h..  Sure you can remove also the ones you
removed.

> Hmmm. I just did that, but what about just a boolean? What other options
> could be required? Maybe some locking/checking?

It is already expected from the caller to properly take
ControlFileLock.  Note I tend to worry too much about the
extensibility of published APIs these days as well, so perhaps just a
boolean would be fine, please let me reconsider that after some sleep,
and it is not like the contents of this routine are going to become
much complicated either, except potentially to control the flags on
open(). :p

> I kept the initial no-parameter function which calls the new one with 4
> parameters, though, because it looks more homogeneous this way in the
> backend code. This is debatable.

True, this actually makes back-patching a bit easier, and there are 13
calls of UpdateControlFile().

> Attached is an update.

Thanks, I'll take a look at that tomorrow.  You have one error at the
end of update_controlfile(), where close() could issue a frontend-like
error for the backend, calling exit() on the way.  That's not good.
(No need to send a new patch, I'll fix it myself.)
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
> You have one error at the end of update_controlfile(), where close() 
> could issue a frontend-like error for the backend, calling exit() on the 
> way.  That's not good. (No need to send a new patch, I'll fix it 
> myself.)

Indeed. I meant to merge the "if (close(fd))", but ended merging the error 
generation as well.

-- 
Fabien


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:
> I kept the initial no-parameter function which calls the new one with 4
> parameters, though, because it looks more homogeneous this way in the
> backend code. This is debatable.

From a compatibility point of view, your position actually makes
sense, at least to me and after sleeping on it as UpdateControlFile is
not static, and that there are interactions with the other local
routines to read and write the control file because of the variable
ControlFile at the top of xlog.c.  So I have kept the original
interface, being now only a wrapper of the new routine.

> Attached is an update.

Thanks, I have committed the patch after fixing a couple of things.
After considering the interface, I have switched to a single boolean
as I could not actually imagine with what kind of fancy features this
could be extended further more.  If I am wrong, let's adjust it later
on.  Here are my notes about the fixes:
- pg_resetwal got broken because the path to the control file was
incorrect.  Running tests of pg_upgrade or the TAP tests of
pg_resetwal showed the failure.
- The previously-mentioned problem with close() in the new routine is
fixed.
- Header comments at the top of update_controlfile were a bit messed
up (s/Not/Note/, missed an "up" as well).
- pg_rewind was issuing a flush of the control file even if --no-sync
was used.
- Nit: incorrect header order in controldata_utils.c.  I have kept the
backend-only includes grouped though.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël,

> Here are my notes about the fixes:

Thanks for the fixes.

> - pg_resetwal got broken because the path to the control file was
> incorrect.  Running tests of pg_upgrade or the TAP tests of
> pg_resetwal showed the failure.

Hmmm… I thought I had done that with "make check-world":-(

> - pg_rewind was issuing a flush of the control file even if --no-sync
> was used.

Indeed, I missed this one.

> - Nit: incorrect header order in controldata_utils.c.  I have kept the
> backend-only includes grouped though.

I'll pay attention to that the next time.

Thanks for the push.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 15, 2019 at 01:37:27PM +0100, Michael Banck wrote:
> Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier:
>> Perhaps having them under --verbose makes more sense?
>
> Well if we think it is essential in order to tell the user what happened
> in the case of an error, it shouldn't be verbose I guess.

I would still keep them to be honest.  I don't know, if others find
the tool too chatty we could always rework that part and tune it.

Please find attached an updated patch set, I have rebased that stuff
on top of my recent commits to refactor the control file updates.
While reviewing, I have found a problem in the docs (forgot a <para>
markup previously), and there was a problem with the parent path fsync
causing an interruption to not return the correct error code, and
actually we should just use durable_rename() in this case (if
--no-sync gets in then pg_mv_file() should be used of course).

I have also been thinking about what we could add in the
documentation, so this version adds a draft to describe the cases
where enabling checksums can lead to corruption when involving
multiple nodes in a cluster and tools doing physical copy of relation
blocks.

I have not done the --no-sync part yet on purpose, as that will most
likely conflict based on the feedback received for this version..
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël,

> Please find attached an updated patch set, I have rebased that stuff
> on top of my recent commits to refactor the control file updates.

Patch applies cleanly, compiles, make check-world seems ok, doc build ok.

It would help if the patch includes a version number. I assume that this 
is v7.

Doc looks ok.

Moving the controlfile looks like an effective way to prevent any 
concurrent start, as the fs operation is probably atomic and especially if 
external tools uses the same trick. However this is not the case yet, eg 
"pg_resetwal" uses a "postmaster.pid" hack instead. Probably the method 
could be unified, possibly with some functions in "controlfile_utils.c".

However, I think that there still is a race condition because of the order 
in which it is implemented:

  pg_checksums reads control file
  pg_checksums checks control file contents...
  ** cluster may be started and the control file updated
  pg_checksums moves the (updated) control file
  pg_checksums proceeds on a running cluster
  pg_checksums moves back the control file
  pg_checksums updates the control file contents, overriding updates

I think that the correct way to handle this for enable/disable is:

  pg_checksums moves the control file
  pg_checksums reads, checks, proceeds, updates
  pg_checksums moves back the control file

This probably means extending a little bit the update_controlfile function 
to allow a suffix. No big deal.

Ok, this might not work, because of the following, less likely, race 
condition:

  postmaster opens control file RW
  pg_checksums moves control file, posmater open file handle follows
  ...

So ISTM that we really need some locking to have something clean.

Why not always use "pgrename" instead of the strange pg_mv_file macro?

Help line about --check not simplified as suggested in a prior review, 
although you said you would take it into account.

Tests look ok.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Tue, Mar 19, 2019 at 11:48:25AM +0100, Fabien COELHO wrote:
> Moving the controlfile looks like an effective way to prevent any concurrent
> start, as the fs operation is probably atomic and especially if external
> tools uses the same trick. However this is not the case yet, eg
> "pg_resetwal" uses a "postmaster.pid" hack instead.

pg_upgrade does so.  Note that pg_resetwal does not check either that
the PID in the file is actually running.

> Probably the method could be unified, possibly with some functions
> in "controlfile_utils.c".

Hm.  postmaster.pid is just here to make sure that the instance is not
started at all, while we require the instance to be stopped cleanly
with other tools, so that's not really consistent in my opinion to
combine both.

> Ok, this might not work, because of the following, less likely, race
> condition:
>
>  postmaster opens control file RW
>  pg_checksums moves control file, postmater open file handle follows
>  ...
>
> So ISTM that we really need some locking to have something clean.

We are talking about complicating a method which is already fine for a
a window where the whole operation works, as it could take hours to
enable checksums, versus a couple of instructions.  I am not sure that
it is worth complicating the code more.

> Help line about --check not simplified as suggested in a prior review,
> although you said you would take it into account.

Oops, it looks like this got lost because of the successive rebases.
I am sure to have updated it at some point..  Anyway, thanks for
pointing it out, I got that fixed on my local branch.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Dienstag, den 19.03.2019, 11:48 +0100 schrieb Fabien COELHO:
> Moving the controlfile looks like an effective way to prevent any 
> concurrent start, as the fs operation is probably atomic and especially if 
> external tools uses the same trick. However this is not the case yet, eg 
> "pg_resetwal" uses a "postmaster.pid" hack instead. Probably the method 
> could be unified, possibly with some functions in "controlfile_utils.c".
> 
> However, I think that there still is a race condition because of the order 
> in which it is implemented:
> 
>   pg_checksums reads control file
>   pg_checksums checks control file contents...
>   ** cluster may be started and the control file updated
>   pg_checksums moves the (updated) control file
>   pg_checksums proceeds on a running cluster
>   pg_checksums moves back the control file
>   pg_checksums updates the control file contents, overriding updates
> 
> I think that the correct way to handle this for enable/disable is:
> 
>   pg_checksums moves the control file
>   pg_checksums reads, checks, proceeds, updates
>   pg_checksums moves back the control file
> 
> This probably means extending a little bit the update_controlfile function 
> to allow a suffix. No big deal.
> 
> Ok, this might not work, because of the following, less likely, race 
> condition:
> 
>   postmaster opens control file RW
>   pg_checksums moves control file, posmater open file handle follows
>   ...
> 
> So ISTM that we really need some locking to have something clean.

I think starting the postmaster during offline maintenance is already
quite some pilot error. As pg_checksums can potentially run for hours
though, I agree it is important to disable the cluster in the meantime.

There's really not a lot going on between pg_checksums reading the
control file and moving it away - what you propose above sounds a bit
like overengineering to me.

If anything, we could include the postmaster.pid check from pg_resetwal
after we have renamed the control file to make absolutely sure that the
cluster is offline. Once the control file is gone and there is no
postmaster.pid, it surely cannot be pg_checksums' problem anymore if a
postmaster is started regardless of maintenance.

I leave that to Michael to decide whether he thinks the above is
warranted.

I think the more important open issue is what to do about PITR and
streaming replication, see my replies to Magnus elsewhere in the thread.

> Why not always use "pgrename" instead of the strange pg_mv_file macro?

pg_ugprade does it the same way, possibly both could be converted to
pgrename, dunno.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
>> Ok, this might not work, because of the following, less likely, race
>> condition:
>>
>>  postmaster opens control file RW
>>  pg_checksums moves control file, postmater open file handle follows
>>  ...
>>
>> So ISTM that we really need some locking to have something clean.
>
> We are talking about complicating a method which is already fine for a
> a window where the whole operation works, as it could take hours to
> enable checksums, versus a couple of instructions.  I am not sure that
> it is worth complicating the code more.

Hmmm. Possibly. The point is that anything only needs to be implemented 
once. The whole point of pg is to have ACID transactional properties, but 
it does not achieve that on the controlfile, which I find paradoxical:-)

Now if there is a race condition opportunity, ISTM that it should be as 
short as possible. Renaming before manipulating seems safer if other 
commands proceeds like that as well. Probably if pg always rename *THEN* 
open before doing anything in all commands it could be safe, provided that 
the renaming is atomic, which I think is the case.

That would avoid locking, at the price of a small probability of having a 
controlfile in a controlfile.command-that-failed-at-the-wrong-time state. 
Maybe it is okay. Maybe there is a need to be able to force the 
state back to something to recover from such unlikely event, but probably 
it does already exists (eg postmaster could be dead without releasing the 
controlfile state).

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Andres Freund
Дата:
Hi,

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> +/*
> + * Locations of persistent and temporary control files.  The control
> + * file gets renamed into a temporary location when enabling checksums
> + * to prevent a parallel startup of Postgres.
> + */
> +#define CONTROL_FILE_PATH        "global/pg_control"
> +#define CONTROL_FILE_PATH_TEMP    CONTROL_FILE_PATH ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

- Andres


Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > +/*
> > + * Locations of persistent and temporary control files.  The control
> > + * file gets renamed into a temporary location when enabling checksums
> > + * to prevent a parallel startup of Postgres.
> > + */
> > +#define CONTROL_FILE_PATH        "global/pg_control"
> > +#define CONTROL_FILE_PATH_TEMP    CONTROL_FILE_PATH ".pg_checksums_in_progress"
> 
> I think this should be outright rejected. Again, you're making the
> control file into something it isn't. And there's no buyin for this as
> far as I can tell outside of Fabien and you. For crying out loud, if the
> server crashes during this YOU'VE CORRUPTED THE CLUSTER.

The cluster is supposed to be offline during this.  This is just an
additional precaution so that nobody starts it during the operation -
similar to how pg_upgrade disables the old data directory.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Andres Freund
Дата:
On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> Hi,
> 
> Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > +/*
> > > + * Locations of persistent and temporary control files.  The control
> > > + * file gets renamed into a temporary location when enabling checksums
> > > + * to prevent a parallel startup of Postgres.
> > > + */
> > > +#define CONTROL_FILE_PATH        "global/pg_control"
> > > +#define CONTROL_FILE_PATH_TEMP    CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > 
> > I think this should be outright rejected. Again, you're making the
> > control file into something it isn't. And there's no buyin for this as
> > far as I can tell outside of Fabien and you. For crying out loud, if the
> > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> 
> The cluster is supposed to be offline during this.  This is just an
> additional precaution so that nobody starts it during the operation -
> similar to how pg_upgrade disables the old data directory.

I don't see how that matters. Afterwards the cluster needs low level
surgery to be recovered. That's a) undocumented b) likely to be done
wrongly.  This is completely unacceptable *AND UNNECESSARY*.

Greetings,

Andres Freund


Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > +/*
> > > > + * Locations of persistent and temporary control files.  The control
> > > > + * file gets renamed into a temporary location when enabling checksums
> > > > + * to prevent a parallel startup of Postgres.
> > > > + */
> > > > +#define CONTROL_FILE_PATH        "global/pg_control"
> > > > +#define CONTROL_FILE_PATH_TEMP    CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > > 
> > > I think this should be outright rejected. Again, you're making the
> > > control file into something it isn't. And there's no buyin for this as
> > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > 
> > The cluster is supposed to be offline during this.  This is just an
> > additional precaution so that nobody starts it during the operation -
> > similar to how pg_upgrade disables the old data directory.
> 
> I don't see how that matters. Afterwards the cluster needs low level
> surgery to be recovered. That's a) undocumented b) likely to be done
> wrongly.  This is completely unacceptable *AND UNNECESSARY*.

Can you explain why low level surgery is needed and how that would look
like?

If pg_checksums successfully enables checksums, it will move back the
control file and update the checksum version - the cluster is ready to
be started again unless I am missing something?

If pg_checksums is interrupted by the admin, it will move back the
control file and the cluster is ready to be started again as well.

If pg_checksums aborts with a failure, the admin will have to move back
the control file before starting up the instance again, but I don't
think that counts?

If pg_checksums crashes due to I/O failures or other causes I can see
how possibly the block it was currently writing might need low level
surgery, but in that case we are in the domain of forensics already I
guess and that still does not pertain to the control file?

Sorry for being obtuse, I don't get it.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Andres Freund
Дата:
Hi,

On 2019-03-19 17:08:17 +0100, Michael Banck wrote:
> Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > +/*
> > > > > + * Locations of persistent and temporary control files.  The control
> > > > > + * file gets renamed into a temporary location when enabling checksums
> > > > > + * to prevent a parallel startup of Postgres.
> > > > > + */
> > > > > +#define CONTROL_FILE_PATH        "global/pg_control"
> > > > > +#define CONTROL_FILE_PATH_TEMP    CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > > > 
> > > > I think this should be outright rejected. Again, you're making the
> > > > control file into something it isn't. And there's no buyin for this as
> > > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > > 
> > > The cluster is supposed to be offline during this.  This is just an
> > > additional precaution so that nobody starts it during the operation -
> > > similar to how pg_upgrade disables the old data directory.
> > 
> > I don't see how that matters. Afterwards the cluster needs low level
> > surgery to be recovered. That's a) undocumented b) likely to be done
> > wrongly.  This is completely unacceptable *AND UNNECESSARY*.
> 
> Can you explain why low level surgery is needed and how that would look
> like?
> 
> If pg_checksums successfully enables checksums, it will move back the
> control file and update the checksum version - the cluster is ready to
> be started again unless I am missing something?
> 
> If pg_checksums is interrupted by the admin, it will move back the
> control file and the cluster is ready to be started again as well.
> 
> If pg_checksums aborts with a failure, the admin will have to move back
> the control file before starting up the instance again, but I don't
> think that counts?

That absolutely counts. Even a short period would imo be unacceptable,
but this will take *hours* in many clusters. It's completely possible
that the machine crashes while the enabling is in progress.

And after restarting postgres or even pg_checksums, you'll just get a
message that there's no control file. How on earth is a normal user
supposed to recover from that?  Now, you could have a check for the
control file under the temporary name, and emit a hint about renaming,
but that has its own angers (like people renaming it just to start
postgres).

And you're basically adding it because Fabien doesn't like
postmaster.pid and wants to invent another lockout mechanism in this
thread.

Greetings,

Andres Freund


Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund:
> On 2019-03-19 17:08:17 +0100, Michael Banck wrote:
> > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > > +/*
> > > > > > + * Locations of persistent and temporary control files.  The control
> > > > > > + * file gets renamed into a temporary location when enabling checksums
> > > > > > + * to prevent a parallel startup of Postgres.
> > > > > > + */
> > > > > > +#define CONTROL_FILE_PATH        "global/pg_control"
> > > > > > +#define CONTROL_FILE_PATH_TEMP    CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > > > > 
> > > > > I think this should be outright rejected. Again, you're making the
> > > > > control file into something it isn't. And there's no buyin for this as
> > > > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > > > 
> > > > The cluster is supposed to be offline during this.  This is just an
> > > > additional precaution so that nobody starts it during the operation -
> > > > similar to how pg_upgrade disables the old data directory.
> > > 
> > > I don't see how that matters. Afterwards the cluster needs low level
> > > surgery to be recovered. That's a) undocumented b) likely to be done
> > > wrongly.  This is completely unacceptable *AND UNNECESSARY*.
> > 
> > Can you explain why low level surgery is needed and how that would look
> > like?
> > 
> > If pg_checksums successfully enables checksums, it will move back the
> > control file and update the checksum version - the cluster is ready to
> > be started again unless I am missing something?
> > 
> > If pg_checksums is interrupted by the admin, it will move back the
> > control file and the cluster is ready to be started again as well.
> > 
> > If pg_checksums aborts with a failure, the admin will have to move back
> > the control file before starting up the instance again, but I don't
> > think that counts?
> 
> That absolutely counts. Even a short period would imo be unacceptable,
> but this will take *hours* in many clusters. It's completely possible
> that the machine crashes while the enabling is in progress.
> 
> And after restarting postgres or even pg_checksums, you'll just get a
> message that there's no control file. How on earth is a normal user
> supposed to recover from that?  Now, you could have a check for the
> control file under the temporary name, and emit a hint about renaming,
> but that has its own angers (like people renaming it just to start
> postgres).

Ok, thanks for explaining. 

I guess if we check for the temporary name in postmaster during startup
if pg_control isn't there then a more generally useful name like
"pg_control.maintenance" should be picked. We could then spit out a nice
error message or hint like "the cluster has been disabled for
maintenance. In order to start it up anyway, rename
pg_control.maintenance to pg_control" or so.

In any case, this would be more of a operational or availability issue
and not a data-loss issue, as I feared from your previous mails.

> And you're basically adding it because Fabien doesn't like
> postmaster.pid and wants to invent another lockout mechanism in this
> thread.

I think the hazard of another DBA (or some automated configuration
management or HA tool for that matter) looking at postmaster.pid,
deciding that it is not a legit file from a running instance, deleting
it and then starting up Postgres while pg_checksums is still at work is
worse than the above scenario, but maybe if we make the content of
postmaster.pid clear enough (like "maintenance in progress"?) it would
be enough of a hint? Or do you have concrete suggestions on how this
should work?

I had the feeling (ab)using postmaster.pid for this would fly less than
using the same scheme as pg_upgrade does, but I'm fine doing it either
way.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Andres Freund
Дата:
Hi,

On 2019-03-19 17:30:16 +0100, Michael Banck wrote:
> Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund:
> > On 2019-03-19 17:08:17 +0100, Michael Banck wrote:
> > > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > > > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > > > +/*
> > > > > > > + * Locations of persistent and temporary control files.  The control
> > > > > > > + * file gets renamed into a temporary location when enabling checksums
> > > > > > > + * to prevent a parallel startup of Postgres.
> > > > > > > + */
> > > > > > > +#define CONTROL_FILE_PATH        "global/pg_control"
> > > > > > > +#define CONTROL_FILE_PATH_TEMP    CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > > > > > 
> > > > > > I think this should be outright rejected. Again, you're making the
> > > > > > control file into something it isn't. And there's no buyin for this as
> > > > > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > > > > 
> > > > > The cluster is supposed to be offline during this.  This is just an
> > > > > additional precaution so that nobody starts it during the operation -
> > > > > similar to how pg_upgrade disables the old data directory.
> > > > 
> > > > I don't see how that matters. Afterwards the cluster needs low level
> > > > surgery to be recovered. That's a) undocumented b) likely to be done
> > > > wrongly.  This is completely unacceptable *AND UNNECESSARY*.
> > > 
> > > Can you explain why low level surgery is needed and how that would look
> > > like?
> > > 
> > > If pg_checksums successfully enables checksums, it will move back the
> > > control file and update the checksum version - the cluster is ready to
> > > be started again unless I am missing something?
> > > 
> > > If pg_checksums is interrupted by the admin, it will move back the
> > > control file and the cluster is ready to be started again as well.
> > > 
> > > If pg_checksums aborts with a failure, the admin will have to move back
> > > the control file before starting up the instance again, but I don't
> > > think that counts?
> > 
> > That absolutely counts. Even a short period would imo be unacceptable,
> > but this will take *hours* in many clusters. It's completely possible
> > that the machine crashes while the enabling is in progress.
> > 
> > And after restarting postgres or even pg_checksums, you'll just get a
> > message that there's no control file. How on earth is a normal user
> > supposed to recover from that?  Now, you could have a check for the
> > control file under the temporary name, and emit a hint about renaming,
> > but that has its own angers (like people renaming it just to start
> > postgres).
> 
> Ok, thanks for explaining. 
> 
> I guess if we check for the temporary name in postmaster during startup
> if pg_control isn't there then a more generally useful name like
> "pg_control.maintenance" should be picked. We could then spit out a nice
> error message or hint like "the cluster has been disabled for
> maintenance. In order to start it up anyway, rename
> pg_control.maintenance to pg_control" or so.

To be very clear: I am going to try to stop any patch with this
mechanism from going into the tree. I think it's an absurdly bad
idea. There'd need to be significant support from a number of other
committers to convince me otherwise.


> In any case, this would be more of a operational or availability issue
> and not a data-loss issue, as I feared from your previous mails.

It's just about undistinguishable for normal users.


> > And you're basically adding it because Fabien doesn't like
> > postmaster.pid and wants to invent another lockout mechanism in this
> > thread.
> 
> I think the hazard of another DBA (or some automated configuration
> management or HA tool for that matter) looking at postmaster.pid,
> deciding that it is not a legit file from a running instance, deleting
> it and then starting up Postgres while pg_checksums is still at work is
> worse than the above scenario, but maybe if we make the content of
> postmaster.pid clear enough (like "maintenance in progress"?) it would
> be enough of a hint?

Err, how would such a tool decide to do that safely? And if it did so,
how would it not cause problems in postgres' normal operation, given
that that postmaster.pid is crucial to prevent two postgres instances
running at the same time?


> Or do you have concrete suggestions on how this should work?

create a postmaster.pid with the pid of pg_checksums. That'll trigger a
postgres start from checking whether that pid is still alive. There'd
probably need to be a tiny change to CreateLockFile() to prevent it from
checking whether any shared memory is connected.  FWIW, it'd probably
actually be good if pg_checksums (and some other tools), did most if not
all the checks in CreateLockFile().

I'm not sure it needs to be this patch's responsibility to come up with
a scheme here at all however. pg_rewind, pg_resetwal, pg_upgrade all
don't really have a lockout mechanism, and it hasn't caused a ton of
problems. I think it'd be good to invent something better, but it can't
be some half assed approach that'll lead to people think their database
is gone.


> I had the feeling (ab)using postmaster.pid for this would fly less than
> using the same scheme as pg_upgrade does, but I'm fine doing it either
> way.

I don't think pg_upgrade is a valid comparison, given that the goal
there is to permanently disable the cluster. It also emits a hint about
it. And only does so at the end of a run.

Greetings,

Andres Freund


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Tue, Mar 19, 2019 at 09:47:17AM -0700, Andres Freund wrote:
> I'm not sure it needs to be this patch's responsibility to come up with
> a scheme here at all however. pg_rewind, pg_resetwal, pg_upgrade all
> don't really have a lockout mechanism, and it hasn't caused a ton of
> problems. I think it'd be good to invent something better, but it can't
> be some half assed approach that'll lead to people think their database
> is gone.

Amen.  Take it as you wish, but that's actually what I was mentioning
upthread one week ago where I argued that it is a problem, but not a
problem of this patch and that this problems concerns other tools:
https://www.postgresql.org/message-id/20190313093150.GE2988@paquier.xyz
And then, my position has been overthrown by anybody on this thread.
So I am happy to see somebody chiming in and say the same thing.

Honestly, I think that what I sent last week, with a patch in its
simplest form, would be enough:
https://www.postgresql.org/message-id/20190313021621.GP13812@paquier.xyz

In short, you keep the main feature with:
- No tweaks with postmaster.pid.
- Rely just on the control file indicating an instance shutdown
cleanly.
- No tweaks with the system ID.
- No renaming of the control file.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 20, 2019 at 08:09:07AM +0900, Michael Paquier wrote:
> In short, you keep the main feature with:
> - No tweaks with postmaster.pid.
> - Rely just on the control file indicating an instance shutdown
> cleanly.
> - No tweaks with the system ID.
> - No renaming of the control file.

FWIW, the simplest version is just like the attached.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Hallo Andres,

> And you're basically adding it because Fabien doesn't like
> postmaster.pid and wants to invent another lockout mechanism in this
> thread.

I did not suggest to rename the control file, but as it is already done by 
another command it did not look like a bad idea in itself, or at least an 
already used bad idea:-)

I'd be okay with anything that works consistently accross all commands 
that may touch a cluster and are mutually exclusive (postmater, pg_rewind, 
pg_resetwal, pg_upgrade, pg_checksums…), without underlying race 
conditions. It could be locking, a control file state, a special file 
(which one ? what is the procedure to create/remove it safely and avoid 
potential race conditions ?), possibly "postmaster.pid", whatever really.

I'll admit that I'm moderately enthousiastic about "posmaster.pid" because 
it does not do anymore what the file names says, but if it really works 
and is used consistently by all commands, why not. In case of unexpected 
problems, the file will probably have to be removed/fixed by hand. I also 
think that the implemented mechanism should be made available in 
"control_utils.c", not duplicated in every command.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Andres Freund
Дата:
Hi,

On 2019-03-20 07:55:39 +0100, Fabien COELHO wrote:
> > And you're basically adding it because Fabien doesn't like
> > postmaster.pid and wants to invent another lockout mechanism in this
> > thread.
> 
> I did not suggest to rename the control file, but as it is already done by
> another command it did not look like a bad idea in itself, or at least an
> already used bad idea:-)

pg_upgrade in link mode intentionally wants to *permanently* disable a
cluster. And it explicitly writes a log message about it. That's not a
case to draw inferrence for this case.


> I'd be okay with anything that works consistently accross all commands that
> may touch a cluster and are mutually exclusive (postmater, pg_rewind,
> pg_resetwal, pg_upgrade, pg_checksums…), without underlying race conditions.
> It could be locking, a control file state, a special file (which one ? what
> is the procedure to create/remove it safely and avoid potential race
> conditions ?), possibly "postmaster.pid", whatever really.
> 
> I'll admit that I'm moderately enthousiastic about "posmaster.pid" because
> it does not do anymore what the file names says, but if it really works and
> is used consistently by all commands, why not. In case of unexpected
> problems, the file will probably have to be removed/fixed by hand. I also
> think that the implemented mechanism should be made available in
> "control_utils.c", not duplicated in every command.

That's just a separate feature.

Greetings,

Andres Freund


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Hallo Andres,

>> [...]
>
> pg_upgrade in link mode intentionally wants to *permanently* disable a
> cluster. And it explicitly writes a log message about it. That's not a
> case to draw inferrence for this case.

Ok. My light knowledge of pg_upgrade inner working does not extend to this 
level of precision.

>> I'd be okay with anything that works consistently accross all commands 
>> [...]
>>
>> I'll admit that I'm moderately enthousiastic about "posmaster.pid" because
>> it does not do anymore what the file names says, but if it really works and
>> is used consistently by all commands, why not. In case of unexpected
>> problems, the file will probably have to be removed/fixed by hand. I also
>> think that the implemented mechanism should be made available in
>> "control_utils.c", not duplicated in every command.
>
> That's just a separate feature.

Possibly, although I'm not sure what in the above is a "separate feature", 
I assume from the "pg_checksum --enable" implementation.

Is it the fact that there could (should, IMO) be some mechanisms to ensure 
that mutually exclusive direct cluster-modification commands are not run 
concurrently?

As "pg_checksums -e" is a potentially long running command, the likelyhood 
of self-inflected wounds is raised significantly: I could do absurd things 
on an enable-checksum-in-progress cluster on a previous version of the 
patch. Thus as a reviewer I'm suggesting to fix the issue.

Or is it the fact that fixing on some critical errors would possibly 
involve some manual intervention at some point?

Or is it something else?

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Michaël-san,

>> In short, you keep the main feature with:
>> - No tweaks with postmaster.pid.
>> - Rely just on the control file indicating an instance shutdown
>> cleanly.
>> - No tweaks with the system ID.
>> - No renaming of the control file.

Hmmm… so nothing:-)

I think that this feature is useful, in complement to a possible 
online-enabling server-managed version.

About patch v8 part 1: applies cleanly, compiles, global & local make 
check ok, doc build ok.

I think that a clear warning not to run any cluster command in parallel, 
under pain of possible cluster corruption, and possibly other caveats 
about replication, should appear in the documentation.

I also think that some mechanism should be used to prevent such occurence, 
whether in this patch or another.

About "If enabling or disabling checksums, the exit status is nonzero if 
the operation failed." I must admit that enabling/disabling an already 
enabled/disabled cluster is still not really a failure for me, because the 
end status is that the cluster is in the state required by the user.

I still think that the control file update should be made *after* the data 
directory has been synced, otherwise the control file could be updated 
while some data files are not. It just means exchanging two lines.


About patch v8 part 2: applies cleanly, compiles, global & local make 
check ok.

The added -N option is not documented.

If the conditional sync is moved before the file update, the file update 
should pass do_sync to update_controlfile.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 20, 2019 at 10:38:36AM +0100, Fabien COELHO wrote:
> Hmmm… so nothing:-)

The core of the feature is still here, fortunately.

> I think that a clear warning not to run any cluster command in parallel,
> under pain of possible cluster corruption, and possibly other caveats about
> replication, should appear in the documentation.

I still have the following extra documentation in my notes:
+ <refsect1>
+  <title>Notes</title>
+  <para>
+   When disabling or enabling checksums in a cluster of multiple instances,
+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently.  When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  </para>
+ </refsect1>
This sounds kind of enough for me but..

> I also think that some mechanism should be used to prevent such occurence,
> whether in this patch or another.

I won't counter-argue on that.

> I still think that the control file update should be made *after* the data
> directory has been synced, otherwise the control file could be updated while
> some data files are not. It just means exchanging two lines.

The parent path of the control file needs also to be flushed to make
the change durable, just doing things the same way pg_rewind keeps the
code simple in my opinion.

> If the conditional sync is moved before the file update, the file update
> should pass do_sync to update_controlfile.

For the durability of the operation, the current order is
intentional.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Michaël-san,

>> I think that a clear warning not to run any cluster command in parallel,
>> under pain of possible cluster corruption, and possibly other caveats about
>> replication, should appear in the documentation.
>
> I still have the following extra documentation in my notes:

Ok, it should have been included in the patch.

> + <refsect1>
> +  <title>Notes</title>
> +  <para>
> +   When disabling or enabling checksums in a cluster of multiple instances,

ISTM that a "postgres cluster" was like an Oracle instance:

See https://www.postgresql.org/docs/devel/creating-cluster.html

So the vocabulary used above seems misleading. I'm not sure how to name an 
Oracle cluster in postgres lingo, though.

> +   it is recommended to stop all the instances of the cluster before doing
> +   the switch to all the instances consistently.

I think that the motivation/risks should appear before the solution. "As 
xyz ..., ...", or there at least the logical link should be outlined.

It is not clear for me whether the following sentences, which seems 
specific to "pg_rewind", are linked to the previous advice, which seems 
rather to refer to streaming replication?

> +   When using a cluster with
> +   tools which perform direct copies of relation file blocks (for example
> +   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
> +   lead to page corruptions in the shape of incorrect checksums if the
> +   operation is not done consistently across all nodes.  Destroying all
> +   the standbys in a cluster first, enabling or disabling checksums on
> +   the primary and finally recreate the cluster nodes from scratch is
> +   also safe.
> +  </para>
> + </refsect1>

Should not disabling in reverse order be safe? the checksum are not 
checked afterwards?

> This sounds kind of enough for me but..

ISTM that the risks could be stated more clearly.

>> I also think that some mechanism should be used to prevent such occurence,
>> whether in this patch or another.
>
> I won't counter-argue on that.

This answer is ambiguous.

>> I still think that the control file update should be made *after* the data
>> directory has been synced, otherwise the control file could be updated while
>> some data files are not. It just means exchanging two lines.
>
> The parent path of the control file needs also to be flushed to make
> the change durable, just doing things the same way pg_rewind keeps the
> code simple in my opinion.

I do not understand. The issue I'm refering to is, when enabling:

  - data files are updated in fs cache
  - control file is updated in fs cache
  * fsync is called
  - updated control file gets written
  - data files are being written...
    but reboot occurs while fsyncing is still in progress

After the reboot, some data files are not fully updated with their 
checksums, although the controlfiles tells that they are. It should then 
fail after a restart when a no-checksum page is loaded?

What am I missing?

Also, I do not see how exchanging two lines makes the code less simple.

>> If the conditional sync is moved before the file update, the file update
>> should pass do_sync to update_controlfile.
>
> For the durability of the operation, the current order is intentional.

See my point above: I think that this order can lead to cluster 
corruption. If not, please be kind enough to try to explain in more 
details why I'm wrong.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:
> I think that the motivation/risks should appear before the solution. "As xyz
> ..., ...", or there at least the logical link should be outlined.
>
> It is not clear for me whether the following sentences, which seems specific
> to "pg_rewind", are linked to the previous advice, which seems rather to
> refer to streaming replication?

Do you have a better idea of formulation?  If you have a failover
which makes use of pg_rewind, or use some backup tool which does
incremental copy of physical blocks like pg_rman, then you have a risk
to mess up instances in a cluster which is the risk I am trying to
outline here.  It is actually fine to do the following actually if
everything is WAL-based as checksums are only computed once a shared
buffer is flushed on a single instance.  Imagine for example a
primary-standby with checksums disabled:
- Shutdown cleanly standby, enable checksums.
- Plug back standby to cluster, let it replay up to the latest point.
- Shutdown cleanly primary.
- Promote standby.
- Enable checksums on the previous primary.
- Add recovery parameters and recommend the primary back to the
cluster.
- All nodes have checksums enabled, without rebuilding any of them.
Per what I could see on this thread, folks tend to point out that
we should *not* include such recommendations in the docs, as it is too
easy to do a pilot error.

> > +   When using a cluster with
> > +   tools which perform direct copies of relation file blocks (for example
> > +   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
> > +   lead to page corruptions in the shape of incorrect checksums if the
> > +   operation is not done consistently across all nodes.  Destroying all
> > +   the standbys in a cluster first, enabling or disabling checksums on
> > +   the primary and finally recreate the cluster nodes from scratch is
> > +   also safe.
> > +  </para>
> > + </refsect1>
>
> Should not disabling in reverse order be safe? the checksum are not checked
> afterwards?

I don't quite understand your comment about the ordering.  If all
the standbys are destroyed first, then enabling/disabling checksums
happens at a single place.

> After the reboot, some data files are not fully updated with their
> checksums, although the controlfiles tells that they are. It should then
> fail after a restart when a no-checksum page is loaded?
>
> What am I missing?

Please note that we do that in other tools as well and we live fine
with that as pg_basebackup, pg_rewind just to name two.  I am not
saying that it is not a problem in some cases, but I am saying that
this is not a problem that this patch should solve.  If we were to do
something about that, it could make sense to make fsync_pgdata()
smarter so as the control file is flushed last there, or define flush
strategies there.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Thu, Mar 21, 2019 at 07:59:24AM +0900, Michael Paquier wrote:
> Please note that we do that in other tools as well and we live fine
> with that as pg_basebackup, pg_rewind just to name two.  I am not
> saying that it is not a problem in some cases, but I am saying that
> this is not a problem that this patch should solve.  If we were to do
> something about that, it could make sense to make fsync_pgdata()
> smarter so as the control file is flushed last there, or define flush
> strategies there.

Anyway, as this stuff is very useful for upgrade scenarios
a-la-pg_upgrade, for backup validation and as it does not produce
false positives, I would really like to get something committed for
v12 in its simplest form...  Are there any recommendations that people
would like to add to the documentation?
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël,

> On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:
>> I think that the motivation/risks should appear before the solution. "As xyz
>> ..., ...", or there at least the logical link should be outlined.
>>
>> It is not clear for me whether the following sentences, which seems specific
>> to "pg_rewind", are linked to the previous advice, which seems rather to
>> refer to streaming replication?
>
> Do you have a better idea of formulation?

I can try, but I must admit that I'm fuzzy about the actual issue. Is 
there a problem on a streaming replication with inconsistent checksum 
settings, or not?

You seem to suggest that the issue is more about how some commands or 
backup tools operate on a cluster.

I'll reread the thread carefully and will make a proposal.

> Imagine for example a primary-standby with checksums disabled: [...]

Yep, that's cool.

>> Should not disabling in reverse order be safe? the checksum are not checked
>> afterwards?
>
> I don't quite understand your comment about the ordering.  If all the 
> standbys are destroyed first, then enabling/disabling checksums happens 
> at a single place.

Sure. I was suggesting that disabling on replicated clusters is possibly 
safer, but do not know the detail of replication & checksumming with 
enough precision to be that sure about it.

>> After the reboot, some data files are not fully updated with their
>> checksums, although the controlfiles tells that they are. It should then
>> fail after a restart when a no-checksum page is loaded?
>>
>> What am I missing?
>
> Please note that we do that in other tools as well and we live fine
> with that as pg_basebackup, pg_rewind just to name two.

The fact that other commands are exposed to the same potential risk is not 
a very good argument not to fix it.

> I am not saying that it is not a problem in some cases, but I am saying 
> that this is not a problem that this patch should solve.

As solving the issue involves exchanging two lines and turning one boolean 
parameter to true, I do not see why it should not be done. Fixing the 
issue takes much less time than writing about it...

And if other commands can be improved fine with me.

> If we were to do something about that, it could make sense to make 
> fsync_pgdata() smarter so as the control file is flushed last there, or 
> define flush strategies there.

ISTM that this would not work: The control file update can only be done 
*after* the fsync to describe the cluster actual status, otherwise it is 
just a question of luck whether the cluster is corrupt on an crash while 
fsyncing. The enforced order of operation, with a barrier in between, is 
the important thing here.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
> Anyway, as this stuff is very useful for upgrade scenarios 
> a-la-pg_upgrade, for backup validation and as it does not produce false 
> positives, I would really like to get something committed for v12 in its 
> simplest form...

Fine with me, the detailed doc is not a showstopper and can be improved 
later one.

> Are there any recommendations that people would like to add to the 
> documentation?

For me, I just want at least a clear warning on potential risks.

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Thu, Mar 21, 2019 at 08:17:32AM +0100, Fabien COELHO wrote:
> I can try, but I must admit that I'm fuzzy about the actual issue. Is there
> a problem on a streaming replication with inconsistent checksum settings, or
> not?
>
> You seem to suggest that the issue is more about how some commands or backup
> tools operate on a cluster.

Yes.  That's what I am writing about.  Imagine for example this case
with pg_rewind:
- primary has checksums enabled.
- standby has checksums disabled.
- a hard crash of the primary happens, there is a failover to the
standby which gets promoted.
- The primary's host is restarted, it is started and stopped once
cleanly to have a clear point in its past timeline where WAL forked
thanks to the generation of at least the shutdown checkpoint generated
by the clean stop.
- pg_rewind is run, copying some pages from the promoted standby,
which don't have checksums, to the primary with checksums enabled, and
causing some pages to have an incorrect checksum.

There is another tool I know of which is called pg_rman, which is a
backup tool able to take incremental backups in the shape of a delta
of relation blocks.  Then imagine the following:
- One instance of Postgres runs, has checksums disabled.
- pg_rman takes a full backup of it.
- Checksums are enabled on this instance.
- An incremental backup from the previous full backup point is taken.
If I recall correctly pg_rman takes a copy of the new control file as
well, which tracks checksums as being enabled.
- A crash happens, the data folder is dead.
- Rollback to the previous backup is done, and we restore up to a
point after the incremental backup.
- And you finish with a cluster which has checksums enabled, but as
the initial full backup had checksums disabled, not all the pages may
be in a correct state.

So I think that it makes sense to tell to be careful within the
documentation, but being too much careful in the tool discards also
many possibilities (see the example of the clean failover where it is
possible to enable checksums with no actual downtime).  And this part
has a lot of value.

> ISTM that this would not work: The control file update can only be done
> *after* the fsync to describe the cluster actual status, otherwise it is
> just a question of luck whether the cluster is corrupt on an crash while
> fsyncing. The enforced order of operation, with a barrier in between, is the
> important thing here.

Done the switch for this case.  For pg_rewind actually I think that
this is an area where its logic could be improved a bit.  So first
the data folder is synced, and then the control file is updated.  It
took less time to change the code than to write this paragraph,
including the code compilation and one run of the TAP tests,
confirmed.

I have added in the docs a warning about a host crash while doing the
operation, with a recommendation to check the state of the checksums
on the data folder should it happen, and the previous portion of the
docs about clusters.  Your suggestion sounds adapted.  I would be
tempted to add a bigger warning in pg_rewind or pg_basebackup about
that, but that's a different story for another time.

Does that look fine to you?
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Freitag, den 22.03.2019, 09:27 +0900 schrieb Michael Paquier:
> I have added in the docs a warning about a host crash while doing the 
> operation, with a recommendation to check the state of the checksums
> on the data folder should it happen, and the previous portion of the
> docs about clusters.  Your suggestion sounds adapted.  I would be
> tempted to add a bigger warning in pg_rewind or pg_basebackup about
> that, but that's a different story for another time.
> 
> Does that look fine to you?

Don't we need a big warning that the cluster must not be started during
operation of pg_checksums as well, now that we don't disallow it?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote:
> Don't we need a big warning that the cluster must not be started during
> operation of pg_checksums as well, now that we don't disallow it?

The same applies to pg_rewind and pg_basebackup, so I would classify
that as a pilot error.  How would you formulate that in the docs if
you add it.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Banck
Дата:
Hi,

Am Freitag, den 22.03.2019, 17:37 +0900 schrieb Michael Paquier:
> On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote:
> > Don't we need a big warning that the cluster must not be started during
> > operation of pg_checksums as well, now that we don't disallow it?
> 
> The same applies to pg_rewind and pg_basebackup, so I would classify
> that as a pilot error.  

How would it apply to pg_basebackup? The cluster is running while the
base backup is taken and I believe the control file is written at the
end so you can't start another instance off the backup directory until
the base backup has finished.

It would apply to pg_rewind, but pg_rewind's runtime is not scaling with
cluster size, does it? pg_checksums will run for hours on large clusters
so the window of errors is much larger and I don't think you can easily
compare the two.

> How would you formulate that in the docs if you add it.

(I would try to make sure you can't start the cluster but that seems off
the table for now)

How about this:

+ <refsect1>
+  <title>Notes</title>
+  <para>
+   When enabling checksums in a cluster, the operation can potentially take a
+   long time if the data directory is large.  During this operation, the 
+   cluster or other programs that write to the data directory must not be 
+   started or else data-loss will occur.
+  </para>
+
+  <para>
+   When disabling or enabling checksums in a cluster of multiple instances,
[...]

Also, the following is not very clear to me:

+   If the event of a crash of the operating system while enabling or

s/If/In/

+   disabling checksums, the data folder may have checksums in an inconsistent
+   state, in which case it is recommended to check the state of checksums
+   in the data folder.

How is the user supposed to check the state of checksums? Do you mean
that if the user intended to enable checksums and the box dies in
between, they should check whether checksums are actually enabled and
re-run if not?  Because it could also mean running pg_checksums --check
on the cluster, which wouldn't work in that case as the control file has
not been updated yet.

Maybe it could be formulated like "If pg_checksums is aborted or killed
in its  operation while enabling or disabling checksums, the cluster
will have the same state with respect of checksums as before the
operation and pg_checksums needs to be restarted."?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 22, 2019 at 10:04:02AM +0100, Michael Banck wrote:
> How about this:
>
> + <refsect1>
> +  <title>Notes</title>
> +  <para>
> +   When enabling checksums in a cluster, the operation can potentially take a
> +   long time if the data directory is large.  During this operation, the
> +   cluster or other programs that write to the data directory must not be
> +   started or else data-loss will occur.
> +  </para>

Sounds fine to me.  Will add an extra paragraph on that.

> Maybe it could be formulated like "If pg_checksums is aborted or killed
> in its operation while enabling or disabling checksums, the cluster
> will have the same state with respect of checksums as before the
> operation and pg_checksums needs to be restarted."?

We could use that as well.  With the current patch, and per the
suggestions from Fabien, this holds true as the control file is
updated and flushed last.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël,

> Does that look fine to you?

Mostly.

Patch v9 part 1 applies cleanly, compiles, global and local check ok, doc 
build ok.

On write(), the error message is not translatable whereas it is for all 
others.

I agree that a BIG STRONG warning is needed about not to start the cluster 
under pain of possible data corruption. I still think that preventing this 
is desirable, preferably before v12.


Otherwise, my remaining non showstopper (committer's opinion matters more) 
issues:

Doc: A postgres cluster is like an Oracle instance. I'd use "replicated 
setup" instead of "cluster", and "cluster" instead of "instance. I'll try 
to improve the text, possibly over the week-end.

Enabling/disabling an already enabled/disabled cluster should not be a 
failure for me, because the cluster is left under the prescribed state.

Patch v9 part 2 applies cleanly, compiles, global and local check ok, doc 
build ok.

Ok for me.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:

> Done the switch for this case.  For pg_rewind actually I think that this 
> is an area where its logic could be improved a bit. So first the data 
> folder is synced, and then the control file is updated.

Attached is a quick patch about "pg_rewind", so that the control file is 
updated after everything else is committed to disk.

-- 
Fabien.
Вложения

Re: Offline enabling/disabling of data checksums

От
Christoph Berg
Дата:
Re: Fabien COELHO 2019-03-22 <alpine.DEB.2.21.1903221514390.2198@lancre>
> Attached is a quick patch about "pg_rewind", so that the control file is
> updated after everything else is committed to disk.

>      update_controlfile(datadir_target, progname, &ControlFile_new, do_sync);
>  
> -    pg_log(PG_PROGRESS, "syncing target data directory\n");
> -    syncTargetDirectory();

Doesn't the control file still need syncing?

Christoph


Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Hello Christoph,

>> -    pg_log(PG_PROGRESS, "syncing target data directory\n");
>> -    syncTargetDirectory();
>
> Doesn't the control file still need syncing?

Indeed it does, and it is done in update_controlfile if the last argument 
is true. Basically update_controlfile latest version always fsync the 
control file, unless explicitely told not to do so. The options to do that 
are really there only to speed up non regression tests.

-- 
Fabien.


Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 22, 2019 at 02:59:31PM +0100, Fabien COELHO wrote:
> On write(), the error message is not translatable whereas it is for all
> others.

Fixed.

> I agree that a BIG STRONG warning is needed about not to start the cluster
> under pain of possible data corruption. I still think that preventing this
> is desirable, preferably before v12.

For now the docs mention that in a paragraph as Michael Banck has
suggested.  Not sure that this deserves a warning portion.

> Otherwise, my remaining non showstopper (committer's opinion matters more)
> issues:
>
> Doc: A postgres cluster is like an Oracle instance. I'd use "replicated
> setup" instead of "cluster", and "cluster" instead of "instance. I'll try to
> improve the text, possibly over the week-end.

Right.  I have reworded that using your suggestions.

And committed the main part.  I'll look after the --no-sync part in a
bit.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 22, 2019 at 07:02:36PM +0100, Fabien COELHO wrote:
> Indeed it does, and it is done in update_controlfile if the last argument is
> true. Basically update_controlfile latest version always fsync the control
> file, unless explicitely told not to do so. The options to do that are
> really there only to speed up non regression tests.

For the control file, it would not really matter much, and the cost
would be really coming from syncing the data directory, still for
correctness it is better to have a full all-or-nothing switch.  Small
buildfarm machines also like the --no-sync flavors a lot.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Sat, Mar 23, 2019 at 08:16:07AM +0900, Michael Paquier wrote:
> And committed the main part.  I'll look after the --no-sync part in a
> bit.

--no-sync is committed as well now.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Fri, Mar 22, 2019 at 03:18:26PM +0100, Fabien COELHO wrote:
> Attached is a quick patch about "pg_rewind", so that the control file is
> updated after everything else is committed to disk.

Could you start a new thread about that please?  This one has already
been used for too many things.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël,

Here is an attempt at improving the Notes.

Mostly it is a reordering from more important (cluster corruption) to less 
important (if interrupted a restart is needed), some reordering from 
problem to solutions instead of solution/problem/solution, some sentence 
simplification.

-- 
Fabien.
Вложения

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Sat, Mar 23, 2019 at 02:14:02PM +0100, Fabien COELHO wrote:
> Here is an attempt at improving the Notes.
>
> Mostly it is a reordering from more important (cluster corruption) to less
> important (if interrupted a restart is needed), some reordering from problem
> to solutions instead of solution/problem/solution, some sentence
> simplification.

So, the ordering of the notes for each paragraph is as follows:
1) Replication issues when mixing different checksum setups across
nodes.
2) Consistency of the operations if killed.
3) Don't start Postgres while the operation runs.

Your proposal is to switch the order of the paragraphs to 3), 1) and
then 2).  Do others have any opinion?  I am fine with the current
order of things, still it may make sense to tweaks the docs.

In the paragraph related to replication, the second statement is
switched to be first so as the docs warn first, and then give
recommendations.  This part makes sense.

I am not sure that "checksum status" is a correct term.  It seems to
me that "same configuration for data checksums as before the tool ran"
or something like that would be more correct.
--
Michael

Вложения

Re: Offline enabling/disabling of data checksums

От
Fabien COELHO
Дата:
Bonjour Michaël,

>> Here is an attempt at improving the Notes. [...]
>
> So, the ordering of the notes for each paragraph is as follows: 1) 
> Replication issues when mixing different checksum setups across nodes. 
> 2) Consistency of the operations if killed. 3) Don't start Postgres 
> while the operation runs.
>
> Your proposal is to switch the order of the paragraphs to 3), 1) and
> then 2).

Yes. I suggest to emphasize cluster corruption risks by putting them 
first.

> Do others have any opinion?  I am fine with the current
> order of things, still it may make sense to tweaks the docs.
>
> In the paragraph related to replication, the second statement is
> switched to be first so as the docs warn first, and then give
> recommendations.

Yep.

> This part makes sense.

Yep!

> I am not sure that "checksum status" is a correct term.  It seems to
> me that "same configuration for data checksums as before the tool ran"
> or something like that would be more correct.

Possibly, I cannot say.

-- 
Fabien.

Re: Offline enabling/disabling of data checksums

От
Michael Paquier
Дата:
On Tue, Mar 26, 2019 at 01:41:38PM +0100, Fabien COELHO wrote:
>> I am not sure that "checksum status" is a correct term.  It seems to
>> me that "same configuration for data checksums as before the tool ran"
>> or something like that would be more correct.
>
> Possibly, I cannot say.

I have put more thoughts into this part, and committed the
reorganization as you mainly suggested.
--
Michael

Вложения