Обсуждение: PATCH: Exclude additional directories in pg_basebackup

Поиск
Список
Период
Сортировка

PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
Recently a hacker proposed a patch to add pg_dynshmem to the list of
directories whose contents are excluded in pg_basebackup.  I wasn't able
to find the original email despite several attempts.

That patch got me thinking about what else could be excluded and after
some investigation I found the following: pg_notify, pg_serial,
pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
or rebuilt on server start.

The attached patch adds these directories to the pg_basebackup
exclusions and takes an array-based approach to excluding directories
and files during backup.

I also incorporated Ashutosh's patch to fix corruption when
pg_stat_tmp/pg_replslot are links [1].  This logic has been extended to
all excluded directories.

Perhaps these patches should be merged in the CF but first I'd like to
see if anyone can identify problems with the additional exclusions.

Thanks,
--
-David
david@pgmasters.net

[1]
http://www.postgresql.org/message-id/flat/CAE9k0Pm7=x_o0W7E2b2s2cWcZdcBGczGdrxttzXOZGp8bEBcGw@mail.gmail.com/

Вложения

Re: PATCH: Exclude additional directories in pg_basebackup

От
Stephen Frost
Дата:
David,

* David Steele (david@pgmasters.net) wrote:
> Recently a hacker proposed a patch to add pg_dynshmem to the list of
> directories whose contents are excluded in pg_basebackup.  I wasn't able
> to find the original email despite several attempts.

That would be here:

b4e94836-786b-6020-e1b3-3d7390f95497@aiven.io

Thanks!

Stephen

Re: PATCH: Exclude additional directories in pg_basebackup

От
Jim Nasby
Дата:
On 8/15/16 2:39 PM, David Steele wrote:
> That patch got me thinking about what else could be excluded and after
> some investigation I found the following: pg_notify, pg_serial,
> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
> or rebuilt on server start.

If someone wanted to scratch an itch, it would also be useful to put 
things that are zeroed under a single dedicated directory, so that folks 
who wanted to could mount that on a ram drive. It would also be useful 
to do that for stuff that's only reset on crash (to put it on local 
storage as opposed to a SAN).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 8/15/16 4:58 PM, Jim Nasby wrote:
> On 8/15/16 2:39 PM, David Steele wrote:
>> That patch got me thinking about what else could be excluded and after
>> some investigation I found the following: pg_notify, pg_serial,
>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>> or rebuilt on server start.
>
> If someone wanted to scratch an itch, it would also be useful to put
> things that are zeroed under a single dedicated directory, so that folks
> who wanted to could mount that on a ram drive. It would also be useful
> to do that for stuff that's only reset on crash (to put it on local
> storage as opposed to a SAN).

I definitely have something like that in mind and thought this was a 
good place to start.

-- 
-David
david@pgmasters.net



Re: PATCH: Exclude additional directories in pg_basebackup

От
Robert Haas
Дата:
On Mon, Aug 15, 2016 at 3:39 PM, David Steele <david@pgmasters.net> wrote:
> Recently a hacker proposed a patch to add pg_dynshmem to the list of
> directories whose contents are excluded in pg_basebackup.  I wasn't able
> to find the original email despite several attempts.
>
> That patch got me thinking about what else could be excluded and after
> some investigation I found the following: pg_notify, pg_serial,
> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
> or rebuilt on server start.

Eh ... I doubt very much that it's safe to blow away the entire
contents of an SLRU between shutdown and startup, even if the data is
technically transient data that won't be needed again after the system
is reset.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PATCH: Exclude additional directories in pg_basebackup

От
Alvaro Herrera
Дата:
Robert Haas wrote:

> Eh ... I doubt very much that it's safe to blow away the entire
> contents of an SLRU between shutdown and startup, even if the data is
> technically transient data that won't be needed again after the system
> is reset.

Hmm.  At least async.c (pg_notify) deletes the whole directory at
startup so it's fine, but for the others (pg_serial, pg_subtrans) I
think it'd be saner to keep any "active" files (probably just the
latest).  I don't remember how pg_snapshot works, but it's probably fine
to start with an empty subdir (is it possible to export a snapshot from
a prepared transaction?)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
Hi Robert,

On 8/17/16 11:27 AM, Robert Haas wrote:
> On Mon, Aug 15, 2016 at 3:39 PM, David Steele <david@pgmasters.net> wrote:
>> Recently a hacker proposed a patch to add pg_dynshmem to the list of
>> directories whose contents are excluded in pg_basebackup.  I wasn't able
>> to find the original email despite several attempts.
>>
>> That patch got me thinking about what else could be excluded and after
>> some investigation I found the following: pg_notify, pg_serial,
>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>> or rebuilt on server start.
>
> Eh ... I doubt very much that it's safe to blow away the entire
> contents of an SLRU between shutdown and startup, even if the data is
> technically transient data that won't be needed again after the system
> is reset.

I've done pretty extensive testing in pgBackRest and haven't seen issues
in any supported version (plus I audited each init() function for every
version back to where it was introduced).  The patch also passes all the
pg_basebackup TAP tests in master.

If you are correct it may indicate a problem anyway. Consider a standby
backup where the files in these directories may be incredibly stale
since they are not replicated.  Once restored to a master should we
trust anything in these files?

pg_serial, pg_notify, pg_subtrans are not even fsync'd
(SlruCtl->do_fsync = false).  It's hard to imagine there's anything of
value in there or that it can be trusted if there is.

The files in pg_snapshot and pg_dynshmem are simply deleted on startup
so that seems safe enough.

-- 
-David
david@pgmasters.net



Re: PATCH: Exclude additional directories in pg_basebackup

От
Robert Haas
Дата:
On Wed, Aug 17, 2016 at 2:50 PM, David Steele <david@pgmasters.net> wrote:
> Hi Robert,
>
> On 8/17/16 11:27 AM, Robert Haas wrote:
>> On Mon, Aug 15, 2016 at 3:39 PM, David Steele <david@pgmasters.net> wrote:
>>> Recently a hacker proposed a patch to add pg_dynshmem to the list of
>>> directories whose contents are excluded in pg_basebackup.  I wasn't able
>>> to find the original email despite several attempts.
>>>
>>> That patch got me thinking about what else could be excluded and after
>>> some investigation I found the following: pg_notify, pg_serial,
>>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>>> or rebuilt on server start.
>>
>> Eh ... I doubt very much that it's safe to blow away the entire
>> contents of an SLRU between shutdown and startup, even if the data is
>> technically transient data that won't be needed again after the system
>> is reset.
>
> I've done pretty extensive testing in pgBackRest and haven't seen issues
> in any supported version (plus I audited each init() function for every
> version back to where it was introduced).  The patch also passes all the
> pg_basebackup TAP tests in master.
>
> If you are correct it may indicate a problem anyway. Consider a standby
> backup where the files in these directories may be incredibly stale
> since they are not replicated.  Once restored to a master should we
> trust anything in these files?
>
> pg_serial, pg_notify, pg_subtrans are not even fsync'd
> (SlruCtl->do_fsync = false).  It's hard to imagine there's anything of
> value in there or that it can be trusted if there is.

It's not just a question of whether the data has value; it's a
question of whether the SLRU code will handle the situation correctly
in all cases if the directory contains no files.  I don't think you
can draw a firm conclusion on that without reading the code.

> The files in pg_snapshot and pg_dynshmem are simply deleted on startup
> so that seems safe enough.

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PATCH: Exclude additional directories in pg_basebackup

От
Kevin Grittner
Дата:
On Wed, Aug 17, 2016 at 1:50 PM, David Steele <david@pgmasters.net> wrote:

>>> That patch got me thinking about what else could be excluded and after
>>> some investigation I found the following: pg_notify, pg_serial,
>>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>>> or rebuilt on server start.
>>
>> Eh ... I doubt very much that it's safe to blow away the entire
>> contents of an SLRU between shutdown and startup, even if the data is
>> technically transient data that won't be needed again after the system
>> is reset.
>
> I've done pretty extensive testing in pgBackRest and haven't seen issues
> in any supported version (plus I audited each init() function for every
> version back to where it was introduced).  The patch also passes all the
> pg_basebackup TAP tests in master.
>
> If you are correct it may indicate a problem anyway. Consider a standby
> backup where the files in these directories may be incredibly stale
> since they are not replicated.  Once restored to a master should we
> trust anything in these files?
>
> pg_serial, pg_notify, pg_subtrans are not even fsync'd
> (SlruCtl->do_fsync = false).  It's hard to imagine there's anything of
> value in there or that it can be trusted if there is.

The contents of pg_serial could be deleted safely.  As I recall, I
initially had it cleaned out on startup, but Heikki (who was the
main committer for this feature) added SLRU buffer flushing for it
on checkpoint and took out the startup delete code with the
explanation that he thought someone might want to look at the file
contents for debugging purposes.  I would be a bit surprised if
anyone ever has used it for that, but that is the reason the
contents are not deleted just like pg_snapshot and pg_dynshmem.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 8/17/16 2:53 PM, Robert Haas wrote:
> On Wed, Aug 17, 2016 at 2:50 PM, David Steele <david@pgmasters.net> wrote:
>> Hi Robert,
>>
>> On 8/17/16 11:27 AM, Robert Haas wrote:
>>> On Mon, Aug 15, 2016 at 3:39 PM, David Steele <david@pgmasters.net> wrote:
>>>> Recently a hacker proposed a patch to add pg_dynshmem to the list of
>>>> directories whose contents are excluded in pg_basebackup.  I wasn't able
>>>> to find the original email despite several attempts.
>>>>
>>>> That patch got me thinking about what else could be excluded and after
>>>> some investigation I found the following: pg_notify, pg_serial,
>>>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>>>> or rebuilt on server start.
>>>
>>> Eh ... I doubt very much that it's safe to blow away the entire
>>> contents of an SLRU between shutdown and startup, even if the data is
>>> technically transient data that won't be needed again after the system
>>> is reset.
>>
>> If you are correct it may indicate a problem anyway. Consider a standby
>> backup where the files in these directories may be incredibly stale
>> since they are not replicated.  Once restored to a master should we
>> trust anything in these files?
>>
>> pg_serial, pg_notify, pg_subtrans are not even fsync'd
>> (SlruCtl->do_fsync = false).  It's hard to imagine there's anything of
>> value in there or that it can be trusted if there is.
> 
> It's not just a question of whether the data has value; it's a
> question of whether the SLRU code will handle the situation correctly
> in all cases if the directory contains no files.  I don't think you
> can draw a firm conclusion on that without reading the code.

A good point, Robert.  I read the code but I should have shared my
findings in the original post. I'll only cover the the cases we have not
already decided were safe (those that explicitly delete files in their
init routines, pg_snapshots and pg_dynshmem).

First, SlruPhysicalWritePage() will create a file/page that does not
exist and SlruPhysicalReadPage() will return zeros for a file/page that
does not exist during recovery.  Not that I think the latter is
important for pg_serial, pg_notify, or pg_subtrans since they are not
WAL-logged.  As far as I can see the slru system is very resilient overall.

For pg_notify, AsyncShmemInit() explicitly deletes all files on startup.

For pg_subtrans, StartupSUBTRANS() explicitly zeroes all required pages
but does not flush them to disk.  Since SlruPhysicalWritePage() is
tolerant of missing files/pages this should be fine.

For pg_serial, OldSerXidInit() doesn't zero anything out since it
ignores the old transactions and they get truncated as the transaction
horizon moves.  The old data hangs around for a little while so it could
theoretically be used for debugging as Kevin notes.  Since pg_serial
would only be cleared after a restore I don't see that as a big issue.

-- 
-David
david@pgmasters.net



Re: PATCH: Exclude additional directories in pg_basebackup

От
Michael Paquier
Дата:
On Thu, Aug 18, 2016 at 1:35 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I don't remember how pg_snapshot works, but it's probably fine
> to start with an empty subdir (is it possible to export a snapshot from
> a prepared transaction?)

From xact.c:   /*    * Likewise, don't allow PREPARE after pg_export_snapshot.  This could be    * supported if we
addedcleanup logic to twophase.c, but for now it    * doesn't seem worth the trouble.    */   if
(XactHasExportedSnapshots())      ereport(ERROR,               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannotPREPARE a transaction that has exported snapshots")));
 
So that's fine.
-- 
Michael



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 8/17/16 7:56 PM, Michael Paquier wrote:
> On Thu, Aug 18, 2016 at 1:35 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> I don't remember how pg_snapshot works, but it's probably fine
>> to start with an empty subdir (is it possible to export a snapshot from
>> a prepared transaction?)
> 
> From xact.c:
>     /*
>      * Likewise, don't allow PREPARE after pg_export_snapshot.  This could be
>      * supported if we added cleanup logic to twophase.c, but for now it
>      * doesn't seem worth the trouble.
>      */
>     if (XactHasExportedSnapshots())
>         ereport(ERROR,
>                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>         errmsg("cannot PREPARE a transaction that has exported snapshots")));
> So that's fine.

Thank you for tracking that down, Michael.

-- 
-David
david@pgmasters.net



Re: PATCH: Exclude additional directories in pg_basebackup

От
Peter Eisentraut
Дата:
On 8/15/16 3:39 PM, David Steele wrote:
> That patch got me thinking about what else could be excluded and after
> some investigation I found the following: pg_notify, pg_serial,
> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
> or rebuilt on server start.
> 
> The attached patch adds these directories to the pg_basebackup
> exclusions and takes an array-based approach to excluding directories
> and files during backup.

We do support other backup methods besides using pg_basebackup.  So I
think we need to document the required or recommended handling of each
of these directories.  And then pg_basebackup should become a consumer
of that documentation.

The current documentation on this is at
<https://www.postgresql.org/docs/devel/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP-DATA>,
which covers pg_xlog and pg_replslot.  I think that documentation should
be expanded, maybe with a simple list that is easy to copy into an
exclude file, following by more detail on each directory.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 9/1/16 9:53 AM, Peter Eisentraut wrote:
> On 8/15/16 3:39 PM, David Steele wrote:
>> That patch got me thinking about what else could be excluded and after
>> some investigation I found the following: pg_notify, pg_serial,
>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>> or rebuilt on server start.
>>
>> The attached patch adds these directories to the pg_basebackup
>> exclusions and takes an array-based approach to excluding directories
>> and files during backup.
>
> We do support other backup methods besides using pg_basebackup.  So I
> think we need to document the required or recommended handling of each
> of these directories.  And then pg_basebackup should become a consumer
> of that documentation.
>
> The current documentation on this is at
> <https://www.postgresql.org/docs/devel/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP-DATA>,
> which covers pg_xlog and pg_replslot.  I think that documentation should
> be expanded, maybe with a simple list that is easy to copy into an
> exclude file, following by more detail on each directory.

Attached is a new patch that adds sgml documentation.  I can expand on
each directory individually if you think that's necessary, but thought
it was better to lump them into a few categories.

--
-David
david@pgmasters.net

Вложения

Re: PATCH: Exclude additional directories in pg_basebackup

От
Michael Paquier
Дата:
On Wed, Sep 7, 2016 at 12:16 AM, David Steele <david@pgmasters.net> wrote:
> On 9/1/16 9:53 AM, Peter Eisentraut wrote:
>>
>> On 8/15/16 3:39 PM, David Steele wrote:
>>>
>>> That patch got me thinking about what else could be excluded and after
>>> some investigation I found the following: pg_notify, pg_serial,
>>> pg_snapshots, pg_subtrans.  These directories are all cleaned, zeroed,
>>> or rebuilt on server start.
>>>
>>> The attached patch adds these directories to the pg_basebackup
>>> exclusions and takes an array-based approach to excluding directories
>>> and files during backup.
>>
>>
>> We do support other backup methods besides using pg_basebackup.  So I
>> think we need to document the required or recommended handling of each
>> of these directories.  And then pg_basebackup should become a consumer
>> of that documentation.
>>
>> The current documentation on this is at
>>
>> <https://www.postgresql.org/docs/devel/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP-DATA>,
>> which covers pg_xlog and pg_replslot.  I think that documentation should
>> be expanded, maybe with a simple list that is easy to copy into an
>> exclude file, following by more detail on each directory.
>
>
> Attached is a new patch that adds sgml documentation.  I can expand on each
> directory individually if you think that's necessary, but thought it was
> better to lump them into a few categories.

+    be ommitted from the backup as they will be initialized on postmaster
+    startup. If the <xref linkend="GUC-STATS-TEMP-DIRECTORY"> is set and is
+    under the database cluster directory then the contents of the directory
+    specified by <xref linkend="GUC-STATS-TEMP-DIRECTORY"> can also
be ommitted.

s/ommitted/omitted/


+#define EXCLUDE_DIR_MAX         8
+#define EXCLUDE_DIR_STAT_TMP     0
+
+const char *excludeDirContents[EXCLUDE_DIR_MAX] =
+{
+    /*
+     * Skip temporary statistics files. The first array position will be
+     * filled with the value of pgstat_stat_directory relative to PGDATA.
+     * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
+     * because PGSS_TEXT_FILE is always created there.
+     */
+    NULL,
I find that ugly. I'd rather use an array with undefined size for the
fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
_tarWriteHeader...
-- 
Michael



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 9/6/16 10:25 PM, Michael Paquier wrote:
> On Wed, Sep 7, 2016 at 12:16 AM, David Steele <david@pgmasters.net> wrote:
>> Attached is a new patch that adds sgml documentation.  I can expand on each
>> directory individually if you think that's necessary, but thought it was
>> better to lump them into a few categories.
> 
> +    be ommitted from the backup as they will be initialized on postmaster
> +    startup. If the <xref linkend="GUC-STATS-TEMP-DIRECTORY"> is set and is
> +    under the database cluster directory then the contents of the directory
> +    specified by <xref linkend="GUC-STATS-TEMP-DIRECTORY"> can also
> be ommitted.
> 
> s/ommitted/omitted/

Thanks!

> +#define EXCLUDE_DIR_MAX         8
> +#define EXCLUDE_DIR_STAT_TMP     0
> +
> +const char *excludeDirContents[EXCLUDE_DIR_MAX] =
> +{
> +    /*
> +     * Skip temporary statistics files. The first array position will be
> +     * filled with the value of pgstat_stat_directory relative to PGDATA.
> +     * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
> +     * because PGSS_TEXT_FILE is always created there.
> +     */
> +    NULL,
> I find that ugly. I'd rather use an array with undefined size for the
> fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
> EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
> _tarWriteHeader...

My goal was to be able to fully reuse the code that creates the paths,
but this could also be done by following your suggestion and also moving
the path code into a function.

Any opinion on this, Peter?

-- 
-David
david@pgmasters.net



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 9/7/16 8:21 AM, David Steele wrote:
> On 9/6/16 10:25 PM, Michael Paquier wrote:
>> I find that ugly. I'd rather use an array with undefined size for the
>> fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
>> EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
>> _tarWriteHeader...
> 
> My goal was to be able to fully reuse the code that creates the paths,
> but this could also be done by following your suggestion and also moving
> the path code into a function.

I just realized all I did was repeat what you said...

-- 
-David
david@pgmasters.net



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 9/6/16 10:25 PM, Michael Paquier wrote:
 >
> On Wed, Sep 7, 2016 at 12:16 AM, David Steele <david@pgmasters.net> wrote:
>
>> Attached is a new patch that adds sgml documentation.  I can expand on each
>> directory individually if you think that's necessary, but thought it was
>> better to lump them into a few categories.
>
> +    be ommitted from the backup as they will be initialized on postmaster
> +    startup. If the <xref linkend="GUC-STATS-TEMP-DIRECTORY"> is set and is
> +    under the database cluster directory then the contents of the directory
> +    specified by <xref linkend="GUC-STATS-TEMP-DIRECTORY"> can also
> be ommitted.
>
> s/ommitted/omitted/

Fixed.

> +#define EXCLUDE_DIR_MAX         8
> +#define EXCLUDE_DIR_STAT_TMP     0
> +
> +const char *excludeDirContents[EXCLUDE_DIR_MAX] =
> +{
> +    /*
> +     * Skip temporary statistics files. The first array position will be
> +     * filled with the value of pgstat_stat_directory relative to PGDATA.
> +     * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
> +     * because PGSS_TEXT_FILE is always created there.
> +     */
> +    NULL,
> I find that ugly. I'd rather use an array with undefined size for the
> fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
> EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
> _tarWriteHeader...

Done.  Also writing out pg_xlog with the new routine.

--
-David
david@pgmasters.net

Вложения

Re: PATCH: Exclude additional directories in pg_basebackup

От
Michael Paquier
Дата:
On Fri, Sep 9, 2016 at 10:18 PM, David Steele <david@pgmasters.net> wrote:
> On 9/6/16 10:25 PM, Michael Paquier wrote:
>>
>>
>> On Wed, Sep 7, 2016 at 12:16 AM, David Steele <david@pgmasters.net> wrote:
>>
>>> Attached is a new patch that adds sgml documentation.  I can expand on
>>> each
>>> directory individually if you think that's necessary, but thought it was
>>> better to lump them into a few categories.
>>
>>
>> +    be ommitted from the backup as they will be initialized on postmaster
>> +    startup. If the <xref linkend="GUC-STATS-TEMP-DIRECTORY"> is set and
>> is
>> +    under the database cluster directory then the contents of the
>> directory
>> +    specified by <xref linkend="GUC-STATS-TEMP-DIRECTORY"> can also
>> be ommitted.
>>
>> s/ommitted/omitted/
>
>
> Fixed.
>
>> +#define EXCLUDE_DIR_MAX         8
>> +#define EXCLUDE_DIR_STAT_TMP     0
>> +
>> +const char *excludeDirContents[EXCLUDE_DIR_MAX] =
>> +{
>> +    /*
>> +     * Skip temporary statistics files. The first array position will be
>> +     * filled with the value of pgstat_stat_directory relative to PGDATA.
>> +     * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is
>> set
>> +     * because PGSS_TEXT_FILE is always created there.
>> +     */
>> +    NULL,
>> I find that ugly. I'd rather use an array with undefined size for the
>> fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
>> EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
>> _tarWriteHeader...
>
>
> Done.  Also writing out pg_xlog with the new routine.

Thanks, this looks in far better shape now.

+   /* Removed on startup, see DeleteAllExportedSnapshotFiles(). */
+   "pg_snapshots",
Using SNAPSHOT_EXPORT_DIR is possible here.

+_tarWriteDir(char *pathbuf, int basepathlen, struct stat *statbuf,
+            bool sizeonly)
That's nice, thanks! This even takes care of the fact when directories
other than pg_xlog are symlinks or junction points.

+   /* Recreated on startup, see StartupReplicationSlots(). */
+   "pg_replslot",
This comment is incorrect, pg_replslot is skipped in a base backup
because that's just useless.
-- 
Michael



Re: PATCH: Exclude additional directories in pg_basebackup

От
Peter Eisentraut
Дата:
The comments on excludeDirContents are somewhat imprecise.  For
example, none of those directories are actually removed or recreated
on startup, only the contents are.

The comment for pg_replslot is incorrect.  I think you can copy
replication slots just fine, but you usually don't want to.

What is the 2 for in this code?
   if (strcmp(pathbuf + 2, excludeFile[excludeIdx]) == 0)

I would keep the signature of _tarWriteDir() similar to
_tarWriteHeader().  So don't pass sizeonly in there, or do pass in
sizeonly but do the same with _tarWriteHeader().

The documentation in pg_basebackup.sgml and protocol.sgml should be
updated.

Add some tests.  At least test that one entry from the directory list
and one entry from the files list is not contained in the backup
result.  Testing the symlink handling would also be good.  (The
pg_basebackup tests claim that Windows doesn't support symlinks and
therefore skip all the symlink tests.  That seems a bit at odds with
your code handling symlinks on Windows.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Exclude additional directories in pg_basebackup

От
Michael Paquier
Дата:
On Tue, Sep 13, 2016 at 3:50 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Add some tests.  At least test that one entry from the directory list
> and one entry from the files list is not contained in the backup
> result.  Testing the symlink handling would also be good.  (The
> pg_basebackup tests claim that Windows doesn't support symlinks and
> therefore skip all the symlink tests.  That seems a bit at odds with
> your code handling symlinks on Windows.)

The code proposed needs to support junction points on Windows so from
this side of things everything is fine. What is lacking here is
support for symlink() in perl for Windows, and that's why the tests
are skipped.
-- 
Michael



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 9/12/16 2:50 PM, Peter Eisentraut wrote:
> The comments on excludeDirContents are somewhat imprecise.  For
> example, none of those directories are actually removed or recreated
> on startup, only the contents are.

Fixed.

> The comment for pg_replslot is incorrect.  I think you can copy
> replication slots just fine, but you usually don't want to.

Fixed.

> What is the 2 for in this code?
>
>     if (strcmp(pathbuf + 2, excludeFile[excludeIdx]) == 0)

This should be basepathlen + 1 (i.e., $PGDATA/).  Fixed.

> I would keep the signature of _tarWriteDir() similar to
> _tarWriteHeader().  So don't pass sizeonly in there, or do pass in
> sizeonly but do the same with _tarWriteHeader().

I'm not sure how much more similar they can be, but I do agree that
sizeonly logic should be wrapped in _tarWriteHeader().  Done.

> The documentation in pg_basebackup.sgml and protocol.sgml should be
> updated.

Done.  I moved a few things to the protocol docs to avoid repetition.

> Add some tests.  At least test that one entry from the directory list
> and one entry from the files list is not contained in the backup
> result.

Done for all files and directories.

> Testing the symlink handling would also be good.

Done using pg_replslot for the test.

> (The
> pg_basebackup tests claim that Windows doesn't support symlinks and
> therefore skip all the symlink tests.  That seems a bit at odds with
> your code handling symlinks on Windows.)

Hopefully Michael's response down-thread answered your question.

--
-David
david@pgmasters.net

Вложения

Re: PATCH: Exclude additional directories in pg_basebackup

От
Michael Paquier
Дата:
On Sat, Sep 24, 2016 at 3:26 AM, David Steele <david@pgmasters.net> wrote:
> On 9/12/16 2:50 PM, Peter Eisentraut wrote:
>> The comments on excludeDirContents are somewhat imprecise.  For
>> example, none of those directories are actually removed or recreated
>> on startup, only the contents are.
>
> Fixed.
>
>> The comment for pg_replslot is incorrect.  I think you can copy
>> replication slots just fine, but you usually don't want to.
>
> Fixed.
>
>> What is the 2 for in this code?
>>
>>     if (strcmp(pathbuf + 2, excludeFile[excludeIdx]) == 0)
>
> This should be basepathlen + 1 (i.e., $PGDATA/).  Fixed.
>
>> I would keep the signature of _tarWriteDir() similar to
>> _tarWriteHeader().  So don't pass sizeonly in there, or do pass in
>> sizeonly but do the same with _tarWriteHeader().
>
> I'm not sure how much more similar they can be, but I do agree that
> sizeonly logic should be wrapped in _tarWriteHeader().  Done.
>
>> The documentation in pg_basebackup.sgml and protocol.sgml should be
>> updated.
>
> Done.  I moved a few things to the protocol docs to avoid repetition.
>
>> Add some tests.  At least test that one entry from the directory list
>> and one entry from the files list is not contained in the backup
>> result.
>
> Done for all files and directories.
>
>> Testing the symlink handling would also be good.
>
> Done using pg_replslot for the test.
>
>> (The
>> pg_basebackup tests claim that Windows doesn't support symlinks and
>> therefore skip all the symlink tests.  That seems a bit at odds with
>> your code handling symlinks on Windows.)
>
> Hopefully Michael's response down-thread answered your question.

Just a nit:        <para>
-         <filename>postmaster.pid</>
+         <filename>postmaster.pid</> and <filename>postmaster.opts</>        </para>
Having one <para> block for each file would be better.

+const char *excludeFile[] =
excludeFiles[]?

+# Move pg_replslot out of $pgdata and create a symlink to it
+rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+   or die "unable to move $pgdata/pg_replslot";
+symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot");
This will blow up on Windows. Those tests need to be included in the
SKIP block. Even if your code needs to support junction points on
Windows, as perl does not offer an equivalent for it we just cannot
test it on this platform.

Except that, the patch looks pretty good to me.
-- 
Michael



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 9/26/16 2:36 AM, Michael Paquier wrote:
> 
> Just a nit:
>          <para>
> -         <filename>postmaster.pid</>
> +         <filename>postmaster.pid</> and <filename>postmaster.opts</>
>          </para>
> Having one <para> block for each file would be better.

I grouped these because they are related and because there are now other
bullets where multiple files/dirs are listed.

Are you proposing to also breakup <filename>pg_replslot</>,
<filename>pg_dynshmem</>, <filename>pg_stat_tmp</>,
<filename>pg_notify</>, <filename>pg_serial</>,
<filename>pg_snapshots</>, and <filename>pg_subtrans</>?

Also <filename>backup_label</> and <filename>tablespace_map</>?

-- 
-David
david@pgmasters.net



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 9/26/16 2:36 AM, Michael Paquier wrote:

> Just a nit:
>          <para>
> -         <filename>postmaster.pid</>
> +         <filename>postmaster.pid</> and <filename>postmaster.opts</>
>          </para>
> Having one <para> block for each file would be better.

OK, changed.

> +const char *excludeFile[] =
> excludeFiles[]?
>
> +# Move pg_replslot out of $pgdata and create a symlink to it
> +rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
> +   or die "unable to move $pgdata/pg_replslot";
> +symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot");
> This will blow up on Windows. Those tests need to be included in the
> SKIP block. Even if your code needs to support junction points on
> Windows, as perl does not offer an equivalent for it we just cannot
> test it on this platform.

Fixed.

--
-David
david@pgmasters.net

Вложения

Re: PATCH: Exclude additional directories in pg_basebackup

От
Michael Paquier
Дата:
On Tue, Sep 27, 2016 at 11:27 PM, David Steele <david@pgmasters.net> wrote:
> On 9/26/16 2:36 AM, Michael Paquier wrote:
>
>> Just a nit:
>>          <para>
>> -         <filename>postmaster.pid</>
>> +         <filename>postmaster.pid</> and <filename>postmaster.opts</>
>>          </para>
>> Having one <para> block for each file would be better.
>
> OK, changed.

You did not actually change it :)

>> +const char *excludeFile[] =
>> excludeFiles[]?
>>
>> +# Move pg_replslot out of $pgdata and create a symlink to it
>> +rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
>> +   or die "unable to move $pgdata/pg_replslot";
>> +symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot");
>> This will blow up on Windows. Those tests need to be included in the
>> SKIP block. Even if your code needs to support junction points on
>> Windows, as perl does not offer an equivalent for it we just cannot
>> test it on this platform.
>
> Fixed.

Thanks for the updated version.

+        <para>
+         <filename>backup_label</> and <filename>tablespace_map</>.  If these
+         files exist they belong to an exclusive backup and are not applicable
+         to the base backup.
+        </para>
This is a bit confusing. When using pg_basebackup the files are
ignored, right, but they are included in the tar stream so they are
not excluded at the end. So we had better remove purely this
paragraph. Similarly, postgresql.auto.conf.tmp is something that
exists only for a short time frame. Not listing it directly is fine
IMO.

+   /* If symlink, write it as a directory anyway */
+#ifndef WIN32
+   if (S_ISLNK(statbuf->st_mode))
+#else
+   if (pgwin32_is_junction(pathbuf))
+#endif
+
+   statbuf->st_mode = S_IFDIR | S_IRWXU;
Indentation here is confusing. Coverity would complain.

+       /* Stat the file */
+       snprintf(pathbuf, MAXPGPATH, "%s/%s", path, de->d_name);
There is no need to put the stat() call this early in the loop as this
is just used for re-creating empty directories.

+            if (strcmp(pathbuf + basepathlen + 1,
+                       excludeFiles[excludeIdx]) == 0)
There is no need for complicated maths, you can just use de->d_name.

pg_xlog has somewhat a similar treatment, but per the exception
handling of archive_status it is better to leave its check as-is.

The DEBUG1 entries are really useful for debugging, it would be nice
to keep them in the final patch.

Thinking harder, your logic can be simplified. You could just do the following:
- Check for interrupts first
- Look at the list of excluded files
- Call lstat()
- Check for excluded directories

After all that fixed, I have moved the patch to "Ready for Committer".
Please use the updated patch though.
--
Michael

Вложения

Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 9/28/16 2:45 AM, Michael Paquier wrote:
> On Tue, Sep 27, 2016 at 11:27 PM, David Steele <david@pgmasters.net> wrote:
>> On 9/26/16 2:36 AM, Michael Paquier wrote:
>>
>>> Just a nit:
>>>          <para>
>>> -         <filename>postmaster.pid</>
>>> +         <filename>postmaster.pid</> and <filename>postmaster.opts</>
>>>          </para>
>>> Having one <para> block for each file would be better.
>>
>> OK, changed.
> 
> You did not actually change it :)

Oops, this was intended to mean that I changed:

>>> +const char *excludeFile[] =
>>> excludeFiles[]?

> +        <para>
> +         <filename>backup_label</> and <filename>tablespace_map</>.  If these
> +         files exist they belong to an exclusive backup and are not applicable
> +         to the base backup.
> +        </para>
> This is a bit confusing. When using pg_basebackup the files are
> ignored, right, but they are included in the tar stream so they are
> not excluded at the end. So we had better remove purely this
> paragraph. Similarly, postgresql.auto.conf.tmp is something that
> exists only for a short time frame. Not listing it directly is fine
> IMO.

OK, I agree that's confusing and probably not helpful to the user.

> +   /* If symlink, write it as a directory anyway */
> +#ifndef WIN32
> +   if (S_ISLNK(statbuf->st_mode))
> +#else
> +   if (pgwin32_is_junction(pathbuf))
> +#endif
> +
> +   statbuf->st_mode = S_IFDIR | S_IRWXU;
> Indentation here is confusing. Coverity would complain.

OK.

> +       /* Stat the file */
> +       snprintf(pathbuf, MAXPGPATH, "%s/%s", path, de->d_name);
> There is no need to put the stat() call this early in the loop as this
> is just used for re-creating empty directories.

OK.

> +            if (strcmp(pathbuf + basepathlen + 1,
> +                       excludeFiles[excludeIdx]) == 0)
> There is no need for complicated maths, you can just use de->d_name.

OK.

> pg_xlog has somewhat a similar treatment, but per the exception
> handling of archive_status it is better to leave its check as-is.

OK.

> The DEBUG1 entries are really useful for debugging, it would be nice
> to keep them in the final patch.

That was my thinking as well.  It's up to Peter, though.

> Thinking harder, your logic can be simplified. You could just do the following:
> - Check for interrupts first
> - Look at the list of excluded files
> - Call lstat()
> - Check for excluded directories

I think setting excludeFound = false the second time is redundant since
it must be false at that point.  Am I missing something or are you just
being cautious in case code changes above?

> After all that fixed, I have moved the patch to "Ready for Committer".
> Please use the updated patch though.

The v6 patch looks good to me.

-- 
-David
david@pgmasters.net



Re: PATCH: Exclude additional directories in pg_basebackup

От
Peter Eisentraut
Дата:
On 9/28/16 2:45 AM, Michael Paquier wrote:
> After all that fixed, I have moved the patch to "Ready for Committer".
> Please use the updated patch though.

Committed after some cosmetic changes.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Exclude additional directories in pg_basebackup

От
David Steele
Дата:
On 9/28/16 9:55 PM, Peter Eisentraut wrote:
> On 9/28/16 2:45 AM, Michael Paquier wrote:
>> After all that fixed, I have moved the patch to "Ready for Committer".
>> Please use the updated patch though.
> 
> Committed after some cosmetic changes.

Thank you, Peter!

-- 
-David
david@pgmasters.net