Обсуждение: pg_rewind just doesn't fsync *anything*?

Поиск
Список
Период
Сортировка

pg_rewind just doesn't fsync *anything*?

От
Andres Freund
Дата:
Hi,

how come that the only comment in pg_rewind about fsyncing is '
void
close_target_file(void)
{
.../* fsync? */
}

Isn't that a bit, uh, minimal for a utility that's likely to be used in
failover scenarios?

I think we might actually be "saved" due to
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ce439f33
because pg_rewind appears to leave the cluster in
   ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;   updateControlFile(&ControlFile_new);

a state that StartupXLOG will treat as needing recovery:

if (ControlFile->state != DB_SHUTDOWNED &&       ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
SyncDataDirectory();

but that code went in after pg_rewind, so this certainly can't be an
intentional save.

I also don't think it's ok that you need to start the cluster to make it
safe against a crash?

I guess the easiest fix would be to shell out to initdb -s?

Greetings,

Andres Freund



Re: pg_rewind just doesn't fsync *anything*?

От
Michael Paquier
Дата:
On Thu, Mar 10, 2016 at 4:43 AM, Andres Freund <andres@anarazel.de> wrote:
> how come that the only comment in pg_rewind about fsyncing is '
> void
> close_target_file(void)
> {
> ...
>         /* fsync? */
> }
>
> Isn't that a bit, uh, minimal for a utility that's likely to be used in
> failover scenarios?
>
> I think we might actually be "saved" due to
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ce439f33
> because pg_rewind appears to leave the cluster in
>
>     ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
>     updateControlFile(&ControlFile_new);

Yep, with minimum recovery target changed as well up to the LSN where
pg_rewind began.

> a state that StartupXLOG will treat as needing recovery:
>
> if (ControlFile->state != DB_SHUTDOWNED &&
>         ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
>         SyncDataDirectory();
>
> but that code went in after pg_rewind, so this certainly can't be an
> intentional save.
> I also don't think it's ok that you need to start the cluster to make it
> safe against a crash?

No, that's not acceptable. One is not obliged to use the rewound data
folder immediately, and my first intuition is that we had better
guarantee that an entry in the file map gets consistent on disk
immediately after being done operating on it. If we get a power loss
after pg_rewind is done we may lose data silently.

> I guess the easiest fix would be to shell out to initdb -s?

What do you mean? I am not sure what initdb has to do with that as we
have no need for it in pg_rewind.
-- 
Michael



Re: pg_rewind just doesn't fsync *anything*?

От
Abhijit Menon-Sen
Дата:
At 2016-03-10 08:35:43 +0100, michael.paquier@gmail.com wrote:
>
> > I guess the easiest fix would be to shell out to initdb -s?
> 
> What do you mean? I am not sure what initdb has to do with that as we
> have no need for it in pg_rewind.

initdb -S/--sync-only fsyncs everything in the data directory and exits.

-- Abhijit



Re: pg_rewind just doesn't fsync *anything*?

От
Michael Paquier
Дата:
On Thu, Mar 10, 2016 at 8:37 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> At 2016-03-10 08:35:43 +0100, michael.paquier@gmail.com wrote:
>>
>> > I guess the easiest fix would be to shell out to initdb -s?
>>
>> What do you mean? I am not sure what initdb has to do with that as we
>> have no need for it in pg_rewind.
>
> initdb -S/--sync-only fsyncs everything in the data directory and exits.

Missed your point, good to know that initdb is not doing anything else
with -S than fsyncing everything in PGDATA. Still, I think that we had
better fsync only entries that are modified by pg_rewind, and files
that got updated, and not the whole directory, a target data folder
should be stopped properly to be able to rewind, and it is better to
avoid dependencies between utilities if that's not strictly necessary.
initdb is likely to be installed side-by-side with pg_rewind in any
distribution though.
-- 
Michael



Re: pg_rewind just doesn't fsync *anything*?

От
Andres Freund
Дата:
Hi,

On 2016-03-10 08:47:16 +0100, Michael Paquier wrote:
> Still, I think that we had better fsync only entries that are modified
> by pg_rewind, and files that got updated, and not the whole directory

Why? If any files in there are dirty, they need to be fsynced. If
they're not dirty, fsync's free.


> a target data folder should be stopped properly to be able to rewind,
> and it is better to avoid dependencies between utilities if that's not
> strictly necessary.  initdb is likely to be installed side-by-side
> with pg_rewind in any distribution though.

It's not like we don't have any other such dependencies, in other
binaries. I'm not concerned.

Having to backpatch a single system() invocation + find_other_exec()
call, and backporting a more general FRONTEND version of initdb's
fsync_pgdata() are wildly differing in complexity.


- Andres



Re: pg_rewind just doesn't fsync *anything*?

От
Michael Paquier
Дата:
On Thu, Mar 10, 2016 at 7:52 PM, Andres Freund <andres@anarazel.de> wrote:
>> a target data folder should be stopped properly to be able to rewind,
>> and it is better to avoid dependencies between utilities if that's not
>> strictly necessary.  initdb is likely to be installed side-by-side
>> with pg_rewind in any distribution though.
>
> It's not like we don't have any other such dependencies, in other
> binaries. I'm not concerned.
>
> Having to backpatch a single system() invocation + find_other_exec()
> call, and backporting a more general FRONTEND version of initdb's
> fsync_pgdata() are wildly differing in complexity.

Talking about HEAD, wouldn't the dependency tree be cleaner if there
is a common facility in src/common? For back-branches, I won't argue
against simplicity, those are more reliable solutions.
-- 
Michael



Re: pg_rewind just doesn't fsync *anything*?

От
Andres Freund
Дата:
On 2016-03-10 20:31:55 +0100, Michael Paquier wrote:
> On Thu, Mar 10, 2016 at 7:52 PM, Andres Freund <andres@anarazel.de> wrote:
> > Having to backpatch a single system() invocation + find_other_exec()
> > call, and backporting a more general FRONTEND version of initdb's
> > fsync_pgdata() are wildly differing in complexity.
> 
> Talking about HEAD, wouldn't the dependency tree be cleaner if there
> is a common facility in src/common?

Yea, probably; but lets sort out the backbranches first.

Andres



Re: pg_rewind just doesn't fsync *anything*?

От
Andres Freund
Дата:
On 2016-03-09 19:43:52 -0800, Andres Freund wrote:
> Hi,
> 
> how come that the only comment in pg_rewind about fsyncing is '
> void
> close_target_file(void)
> {
> ...
>     /* fsync? */
> }
> 
> Isn't that a bit, uh, minimal for a utility that's likely to be used in
> failover scenarios?
> 
> I think we might actually be "saved" due to
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ce439f33
> because pg_rewind appears to leave the cluster in
> 
>     ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
>     updateControlFile(&ControlFile_new);
> 
> a state that StartupXLOG will treat as needing recovery:
> 
> if (ControlFile->state != DB_SHUTDOWNED &&
>         ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
>         SyncDataDirectory();
> 
> but that code went in after pg_rewind, so this certainly can't be an
> intentional save.
> 
> I also don't think it's ok that you need to start the cluster to make it
> safe against a crash?
> 
> I guess the easiest fix would be to shell out to initdb -s?

I've pushed a modified version of the fix that Michael posted in
http://archives.postgresql.org/message-id/CAB7nPqRmM%2BCX6bVxw0Y7mMVGMFj1S8kwhevt8TaP83yeFRfbXA%40mail.gmail.com

Andres



Re: pg_rewind just doesn't fsync *anything*?

От
Michael Paquier
Дата:
On Mon, Mar 28, 2016 at 6:52 AM, Andres Freund <andres@anarazel.de> wrote:
> I've pushed a modified version of the fix that Michael posted in
> http://archives.postgresql.org/message-id/CAB7nPqRmM%2BCX6bVxw0Y7mMVGMFj1S8kwhevt8TaP83yeFRfbXA%40mail.gmail.com

Thanks.
-- 
Michael