Обсуждение: FW: Setting up of PITR system.
we want to remove archived WAL log files which are no longer needed
so that the size of wal_archive folder is under control.
The DOCs say that filenames numerically less than the WAL record that pg_stop_backup()
suggests can be removed. Will an alphabetical sorting be different from numerically sorted
filename ? Sorting numerically is not easy as the filenames look like HEX numbers
ut they are are so huge that they cant be stored in normal integers (32bits).
The WAL files are all numerical – however the backup file has the checkpoint appended to it – eg. 00009012514000916.A0AC91.backup.
You need to examine the contents of this file to find the earliest file needed (it’s usually the one just before it – i.e. 00009012514000916 in this case) and any numerically later than the .backup file.
Also you wouldn’t particular need any heavy programming – I’m sure a simple shell script could be written in bash to pick out the correct files.
Q2. We are attempting to automate the process of taking base backup and removal
of the unneeded WAL files. Is there any reliable way of knowing the WAL file X from
inside the shell script , such that files prior to X can be removed ?
from a shell script we issue pg_stop_backup() by psql -c , it returns something
which does not looks like a WAL file.
Pg_stop_backup() returns the checkpoint record – something like 9/A0AC91 – this is purely a guess, but you might be able to find which backup file contains this checkpoint by taking the digits after the forward-slash in the checkpoint (i.e. A0AC91 in this case), then finding the filename that contains this – in my example it’s 00009012514000916.A0AC91.backup, and grep the file for the number after the text “Start WAL Location: “ in this file – then remove anything numerically less.
It’d be a very “bitty” process, but I’m certain it could be done – it would need heavy testing over a period of backups though to ensure the wrong files are not being deleted.
Q3. tar exits with non zero status for the same reasons as mentioned in docs
is there any better archiving tool for this purpose ? can we use cp -a ?
The WAL archive command can be set to use either cp or mv – then why not have a regular cronjob that runs a shell script to add any new files to your tar archive every time a new file is detected in your WAL-archive directory?
Regds
Rajesh Kumar Mallah.
!DSPAM:14,44293ed135042000516834!
Regards
Andy Shellam
"Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes: > The DOCs say that filenames numerically less than the WAL record that > pg_stop_backup() suggests can be removed. Will an alphabetical sorting > be different from numerically sorted filename ? If you're worried about that, try "LANG=C ls ..." to make sure the sorting is done in C locale. I don't know of any locales that would sort hex numbers differently from C, but maybe there are some. regards, tom lane
Hi, Thanks everyone for the reply, is it reasonable/advisable to start with the output of pg_stop_backup() in a shell script to find the WAL file being discussed, ie the FILE which can be used as a reference for removing the older files ? secondly , i was asking about the tool for copying the database directory not the one for archiving the logs. when i use tar , it gives warning that the file changed while it was reading the file, i need a tool that does not give warning for this kind of activity. Regds Rajesh Kumar Mallah. On 3/28/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes: > > The DOCs say that filenames numerically less than the WAL record that > > pg_stop_backup() suggests can be removed. Will an alphabetical sorting > > be different from numerically sorted filename ? > > If you're worried about that, try "LANG=C ls ..." to make sure the > sorting is done in C locale. I don't know of any locales that would > sort hex numbers differently from C, but maybe there are some. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
If pg_start_backup is used correctly, no data pages will be written to disk until pg_stop_backup is called, so this shouldn't be an issue - HOWEVER - check that you're excluding your pg_xlog directory (within your database directory) from the tar backup as these will be changing. Whatever archive utility you're using, if a file is being changed at the time of reading, it won't be archived correctly. Andy -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Rajesh Kumar Mallah Sent: Tuesday, 28 March, 2006 5:27 PM To: Tom Lane Cc: andy.shellam@mailnetwork.co.uk; pgsql-admin@postgresql.org Subject: Re: FW: [ADMIN] Setting up of PITR system. --- secondly , i was asking about the tool for copying the database directory not the one for archiving the logs. when i use tar , it gives warning that the file changed while it was reading the file, i need a tool that does not give warning for this kind of activity. ---
On 3/28/06, Andy Shellam <andy.shellam@mailnetwork.co.uk> wrote: > > > > we want to remove archived WAL log files which are no longer needed > so that the size of wal_archive folder is under control. > > The DOCs say that filenames numerically less than the WAL record that > pg_stop_backup() > suggests can be removed. Will an alphabetical sorting be different from > numerically sorted > filename ? Sorting numerically is not easy as the filenames look like > HEX numbers > ut they are are so huge that they cant be stored in normal integers > (32bits). > > The WAL files are all numerical – however the backup file has the checkpoint > appended to it – eg. 00009012514000916.A0AC91.backup. > > You need to examine the contents of this file to find the earliest file > needed (it's usually the one just before it – i.e. 00009012514000916 in > this case) and any numerically later than the .backup file. > > > > Also you wouldn't particular need any heavy programming – I'm sure a simple > shell script could be written in bash to pick out the correct files. OK i am posting my full script [ its not heavy programming i guess :) ] shall be grateful if you/someone could review it . (its well commented i think) script also carries sample data. it does following 1. takes base backup to a destined folder by rsync 2. waits for .backup file to arrive in archive folder after pg_stop_bacup() 3. searches and removes unwanted archived log files. I have run it many times in my server and it seems to be working fine. ------------------------------------ BEGIN -------------------------------------------- #!/bin/bash # folder where base_backup is put BACKUPFOLDER=/mnt/disk3/base_backups today=`date +%d-%m-%Y-%H-%M-%S` PSQL=/opt/usr/local/pgsql/bin/psql RSYNC="/usr/bin/rsync -a" PGDATADIR=/mnt/disk5/pgdatadir # two table spaces. TS1=/mnt/disk4/bigtables TS2=/mnt/disk3/indexspace # folder where *archived* logs are put. WAL_ARCHIVE=/mnt/wal_archive label=base_backup_${today} echo "Executing pg_start_backup with label $label in server ... " # get the checkpoint at which backup starts # the .backup files seems to be bearing this string in it. CP=`$PSQL -q -Upostgres -d template1 -c "SELECT pg_start_backup('$label');" -P tuples_only -P format=unaligned` echo "Begin CheckPoint is $CP" # this contain string like A/681D1214 if [ $? -ne 0 ] then echo "PSQL pg_start_backup failed" exit 1; fi echo "pg_start_backup executed successfully" echo "RSYNC begins.." # rsync each of the folders to the backup folder. for i in $TS1 $TS2 $PGDATADIR ; do echo "Syncing $i .. " time $RSYNC $i $BACKUPFOLDER echo "Done" done # fortunately rsync does *not* seems to be exitting with non zero exit code # for expected file disappearances and modifications. if [ $? -ne 0 ] then echo "RSYNC failed" exit 1; fi echo "RSYNC Done successfully" echo "Executing pg_stop_backup in server ... " $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();" if [ $? -ne 0 ] then echo "PSQL pg_stop_backup failed" exit 1; fi echo "pg_stop_backup done successfully" TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214 # now remove the unneeded files. # strip off first 4 chars from CP and prefix 00 to the result. # search the file that has the Checkpoint in its filename. # it takes a while to come, so wait till it comes. while true; do REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH` if [ ! $REF_FILE ]; then echo "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE ... " else break fi sleep 1 done # REF_FILE is 000000010000000A00000068.001D1214.backup.bz2 # take only first 24 chars and store. REF_FILE_NUM=${REF_FILE:0:24} # REF_FILE_NUM is 000000010000000A00000068 echo "REF_FILE_NUM=$REF_FILE_NUM" # iterate list of files in the WAL_ARCHIVE folder for i in `ls -1 $WAL_ARCHIVE` ; do # $i is :000000010000000A0000005D.bz2 eg # get first 24 chars in filename FILE_NUM=${i:0:24} # compare if the number is less than the reference # here string comparison is being used. if [[ $FILE_NUM < $REF_FILE_NUM ]] then echo "$FILE_NUM [ $i ] removed" rm -f $WAL_ARCHIVE/$i else echo "$FILE_NUM [ $i ] not removed" fi done ------------------------------------ END ----------------------------------------------- ---------------------------------- REAL OUTPUT OF A RUN----------------------- Executing pg_start_backup with label base_backup_29-03-2006-17-29-01 in server ... Begin CheckPoint is A/681D1214 pg_start_backup executed successfully RSYNC begins.. Syncing /mnt/disk4/bigtables .. real 6m24.338s user 1m12.831s sys 0m55.295s Done Syncing /mnt/disk3/indexspace .. real 5m45.245s user 0m32.520s sys 0m26.567s Done Syncing /mnt/disk5/pgdatadir .. readlink pgdatadir/global/pgstat.tmp: No such file or directory rsync error: some files could not be transferred (code 23) at main.c(620) real 12m1.844s user 1m43.698s sys 1m24.486s Done RSYNC Done successfully Executing pg_stop_backup in server ... pg_stop_backup ---------------- A/6FA82D74 (1 row) pg_stop_backup done successfully Waitng for file with 001D1214 in /mnt/wal_archive ... REF_FILE_NUM=000000010000000A00000068 000000010000000A0000005D [ 000000010000000A0000005D.003C0B54.backup.bz2 ] removed 000000010000000A0000005D [ 000000010000000A0000005D.bz2 ] removed 000000010000000A0000005E [ 000000010000000A0000005E.bz2 ] removed 000000010000000A0000005F [ 000000010000000A0000005F.bz2 ] removed 000000010000000A00000060 [ 000000010000000A00000060.bz2 ] removed 000000010000000A00000061 [ 000000010000000A00000061.bz2 ] removed 000000010000000A00000062 [ 000000010000000A00000062.bz2 ] removed 000000010000000A00000063 [ 000000010000000A00000063.bz2 ] removed 000000010000000A00000064 [ 000000010000000A00000064.bz2 ] removed 000000010000000A00000065 [ 000000010000000A00000065.bz2 ] removed 000000010000000A00000066 [ 000000010000000A00000066.bz2 ] removed 000000010000000A00000067 [ 000000010000000A00000067.bz2 ] removed 000000010000000A00000068 [ 000000010000000A00000068.001D1214.backup.bz2 ] not removed 000000010000000A00000068 [ 000000010000000A00000068.bz2 ] not removed 000000010000000A00000069 [ 000000010000000A00000069.bz2 ] not removed 000000010000000A0000006A [ 000000010000000A0000006A.bz2 ] not removed 000000010000000A0000006B [ 000000010000000A0000006B.bz2 ] not removed 000000010000000A0000006C [ 000000010000000A0000006C.bz2 ] not removed 000000010000000A0000006D [ 000000010000000A0000006D.bz2 ] not removed 000000010000000A0000006E [ 000000010000000A0000006E.bz2 ] not removed ------------------------------- END----------------------------------------------------- > > > Q2. We are attempting to automate the process of taking base backup and > removal > of the unneeded WAL files. Is there any reliable way of knowing the > WAL file X from > inside the shell script , such that files prior to X can be removed ? > > from a shell script we issue pg_stop_backup() by psql -c , it returns > something > which does not looks like a WAL file. > > > > Pg_stop_backup() returns the checkpoint record – something like 9/A0AC91 – > this is purely a guess, but you might be able to find which backup file > contains this checkpoint by taking the digits after the forward-slash in the > checkpoint (i.e. A0AC91 in this case), then finding the filename that > contains this – in my example it's 00009012514000916.A0AC91.backup, I think its not correct , pg_start_backup() returns the checkpoint record which becomes part of the backup filename. and grep > the file for the number after the text "Start WAL Location: " in this file – > then remove anything numerically less. > > > > It'd be a very "bitty" process, but I'm certain it could be done – it would > need heavy testing over a period of backups though to ensure the wrong files > are not being deleted. > > > > Q3. tar exits with non zero status for the same reasons as mentioned in > docs > is there any better archiving tool for this purpose ? can we use cp > -a ? > > > > The WAL archive command can be set to use either cp or mv – then why not > have a regular cronjob that runs a shell script to add any new files to your > tar archive every time a new file is detected in your WAL-archive directory? Hmmm you got me wrong, am talking about taking the base_backup not archival of WAL segments > > > > Regds > Rajesh Kumar Mallah. > > !DSPAM:14,44293ed135042000516834! > > > Regards > > > > Andy Shellam
On 3/29/06, Andy Shellam <andy.shellam@mailnetwork.co.uk> wrote: > If pg_start_backup is used correctly, no data pages will be written to disk > until pg_stop_backup is called, so this shouldn't be an issue - HOWEVER - > check that you're excluding your pg_xlog directory (within your database > directory) from the tar backup as these will be changing. > > Whatever archive utility you're using, if a file is being changed at the > time of reading, it won't be archived correctly. ?????? What you are saying is not in sync with docs >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 23.3.2. Making a Base Backup The procedure for making a base backup is relatively simple: 3. Perform the backup, using any convenient file-system-backup tool such as tar or cpio. It is neither necessary nor desirable to stop normal operation of the database while you do this. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I used rsync like many others , it seems to work fine for me. Regds mallah. > > Andy > > -----Original Message----- > From: pgsql-admin-owner@postgresql.org > [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Rajesh Kumar Mallah > Sent: Tuesday, 28 March, 2006 5:27 PM > To: Tom Lane > Cc: andy.shellam@mailnetwork.co.uk; pgsql-admin@postgresql.org > Subject: Re: FW: [ADMIN] Setting up of PITR system. > > --- > > secondly , i was asking about the tool for copying the > database directory not the one for archiving the logs. > when i use tar , it gives warning that the file changed > while it was reading the file, i need a tool that does not > give warning for this kind of activity. > > --- > > >
"Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes: > If pg_start_backup is used correctly, no data pages will be written to disk > until pg_stop_backup is called, so this shouldn't be an issue That's incorrect, the database will continue to operate normally. The only thing pg_start_backup really does is lay down a marker so you can tell how far back you need to save WAL files in order to do a PITR restore using your backup. The fact that the backup tool might get an inconsistent copy of some files is not an issue, because replay of the WAL files against the copied database will fix any inconsistencies. It definitely is a pain in the neck that GNU tar complains about files changing underneath it --- I've looked for a way to disable that, or at least reduce it to a warning instead of an error condition, but gtar doesn't seem to have such a switch. You should try alternative backup tools such as cpio or rsync. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Rajesh Kumar Mallah wrote: | | OK i am posting my full script [ its not heavy programming i guess :) ] | shall be grateful if you/someone could review it . (its well commented i think) | script also carries sample data. | | it does following | 1. takes base backup to a destined folder by rsync | 2. waits for .backup file to arrive in archive folder | after pg_stop_bacup() | 3. searches and removes unwanted archived log files. | | I have run it many times in my server and it seems to | be working fine. Hello, Rajesh. Just a couple of comments on the script itself, not what it actually does - I never tried WAL archiving before, so I can't comment on that. I inserted the comments at relevant points in the script. I'm sorting them into three categories, one is just improvements in style, the other is optimization and the third is correction of an error. | ------------------------------------ BEGIN | -------------------------------------------- | #!/bin/bash | | # folder where base_backup is put | BACKUPFOLDER=/mnt/disk3/base_backups | today=`date +%d-%m-%Y-%H-%M-%S` | PSQL=/opt/usr/local/pgsql/bin/psql | RSYNC="/usr/bin/rsync -a" | PGDATADIR=/mnt/disk5/pgdatadir | | # two table spaces. | | TS1=/mnt/disk4/bigtables | TS2=/mnt/disk3/indexspace (optimization) Since you're using bash, you can use arrays. This could be better written as ~ TS[0]=/mnt/disk5/pgdatadir ~ TS[1]=/mnt/disk4/bigtables ~ TS[2]=/mnt/disk3/indexspace or even ~ TS=(/mnt/disk5/pgdatadir \ ~ /mnt/disk4/bigtables \ ~ /mnt/disk3/indexspace) That way, you can add tablespaces at will and just use a while loop to back them up, which greatly simplifies adding new tablespaces or moving the script somewhere else. See below for how to implement that. | # folder where *archived* logs are put. | WAL_ARCHIVE=/mnt/wal_archive | | label=base_backup_${today} | | echo "Executing pg_start_backup with label $label in server ... " | | # get the checkpoint at which backup starts | # the .backup files seems to be bearing this string in it. | | CP=`$PSQL -q -Upostgres -d template1 -c "SELECT | pg_start_backup('$label');" -P tuples_only -P format=unaligned` | | echo "Begin CheckPoint is $CP" # this contain string like A/681D1214 | | if [ $? -ne 0 ] | then | echo "PSQL pg_start_backup failed" | exit 1; | fi | echo "pg_start_backup executed successfully" (style) If you want to capture any error messages pg_start_backup may have caused and store them into ${CP}, you should add 2>&1 at the end of the psql invocation, see below snippet. (error) Checking for exit status of pg_start_backup using $? at this point will never report an error, as you've used echo prior to checking what pg_start_backup returned. You should either move the echo below the if statement (by adding an "else" clause) or store the exit status of pg_start_backup into RVAL like this: ~ CP="`$PSQL ... 2>&1`" ~ RVAL=$? ~ echo "Begin CheckPoint says: ${CP}" ~ if [ ${RVAL} -ne 0 ]; then ~ ... ~ fi | echo "RSYNC begins.." | | # rsync each of the folders to the backup folder. | for i in $TS1 $TS2 $PGDATADIR ; | do | echo "Syncing $i .. " | time $RSYNC $i $BACKUPFOLDER | echo "Done" | done (optimization) If you store locations into an array, you could rewrite this as follows: ~ CTR=0 ~ while [ -n "${TS[${CTR}]}" ]; do ~ echo "Syncing ${TS[${CTR}]}..." ~ time ${RSYNC} ${TS[${CTR}]} ${BACKUPFOLDER} ~ RVAL=$? ~ echo "Sync finished with exit status ${RVAL}" ~ if [ ${RVAL} -ne 0 ]; then ~ <handle errors> ~ fi ~ CTR=$((CTR + 1)) ~ done ~ unset CTR | # fortunately rsync does *not* seems to be exitting with non zero exit code | # for expected file disappearances and modifications. | if [ $? -ne 0 ] | then | echo "RSYNC failed" | exit 1; | fi (error) Same error as above - what you're checking here is whether the last command in the last for loop run was successful, and this is always going to be true as echoing to stdout will never fail until stdout is closed for some reason. | echo "RSYNC Done successfully" | | echo "Executing pg_stop_backup in server ... " | $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();" | if [ $? -ne 0 ] | then | echo "PSQL pg_stop_backup failed" | exit 1; | fi | echo "pg_stop_backup done successfully" | TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214 | | # now remove the unneeded files. | | # strip off first 4 chars from CP and prefix 00 to the result. | # search the file that has the Checkpoint in its filename. | # it takes a while to come, so wait till it comes. | | while true; do | REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH` | if [ ! $REF_FILE ]; then | echo "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE ... " | else | break | fi | sleep 1 | done (optimization) You could simplify this significantly using the test builtin: ~ while [ ! -e ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2 ]; do ~ echo "Waiting for ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2" ~ sleep 1 ~ done ~ REF_FILE="`echo ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2`" | # REF_FILE is 000000010000000A00000068.001D1214.backup.bz2 | | # take only first 24 chars and store. | REF_FILE_NUM=${REF_FILE:0:24} | | # REF_FILE_NUM is 000000010000000A00000068 | | echo "REF_FILE_NUM=$REF_FILE_NUM" | | # iterate list of files in the WAL_ARCHIVE folder | for i in `ls -1 $WAL_ARCHIVE` ; | do | # $i is :000000010000000A0000005D.bz2 eg | # get first 24 chars in filename | FILE_NUM=${i:0:24} | | # compare if the number is less than the reference | # here string comparison is being used. | if [[ $FILE_NUM < $REF_FILE_NUM ]] | then | echo "$FILE_NUM [ $i ] removed" | rm -f $WAL_ARCHIVE/$i | else | echo "$FILE_NUM [ $i ] not removed" | fi | done (optimization) Perhaps using find -newer/-anewer/-cnewer could be used here to find files older than the reference file: ~ # "-not -newer" or "\! -newer" will also return REF_FILE ~ # so you have to grep it out and use xargs; otherwise you ~ # could also use the -delete action ~ find ${WAL_ARCHIVE} \! -newer ${REF_FILE} -type f | \ ~ grep -v "^${REF_FILE}$" | \ ~ xargs rm -f |>The WAL archive command can be set to use either cp or mv ? then why not |>have a regular cronjob that runs a shell script to add any new files to your |>tar archive every time a new file is detected in your WAL-archive directory? | | Hmmm you got me wrong, am talking about taking the base_backup not | archival of WAL segments Just a thought - it might be worth trying star (Schilly's tar); it won't bail out completely if a file changes during read, and I'm 100% sure that when it only changes in size, it is going to be stored nevertheless. It also has functionality for storing ACLs and xattrs in the tar header, works on numerous UNIX flavours and comes as a system package with most of the popular distributions. Hope this helped, - -- ~ Grega Bremec ~ gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFEK5Pdfu4IwuB3+XoRA7KVAJ9wbK6RDBnA4s0cjbdR3FE6+oUvGwCfWeAO SecCgFnus70LKNYCDNeV3Z0= =oUrb -----END PGP SIGNATURE-----
On Wed, Mar 29, 2006 at 06:15:59PM +0530, Rajesh Kumar Mallah wrote: > > Also you wouldn't particular need any heavy programming ? I'm sure a simple > > shell script could be written in bash to pick out the correct files. > > OK i am posting my full script [ its not heavy programming i guess :) ] > shall be grateful if you/someone could review it . (its well commented i think) > script also carries sample data. Would you be willing to share this script at http://pgfoundry.org/projects/pgpitrha/ ? Or maybe another project would be worth starting... I know right now everyone is pretty much rolling their own when it comes to dealing with PITR, so it'd be good to at least get all the scripts in one place... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Mar 29, 2006 at 09:46:30AM -0500, Tom Lane wrote: > It definitely is a pain in the neck that GNU tar complains about files > changing underneath it --- I've looked for a way to disable that, or at > least reduce it to a warning instead of an error condition, but gtar > doesn't seem to have such a switch. You should try alternative backup > tools such as cpio or rsync. Or you might submit a patch to GNU tar. There's also a BSD version of tar, it's on at least FreeBSD (not sure if there's a stand-alone version avaiable). It might not suffer from the same problem, though I haven't tried it. I do have a machine I could try it with if that would help. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
We have high volume inserts happening with wal files being generated very rapidly. Will the fact that the files change as the tar happens have any effect at all on the recovery..assuming all the appropriate wal archive logs are available. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby Sent: Friday, March 31, 2006 10:11 AM To: Tom Lane Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah'; pgsql-admin@postgresql.org Subject: Re: FW: [ADMIN] Setting up of PITR system. On Wed, Mar 29, 2006 at 09:46:30AM -0500, Tom Lane wrote: > It definitely is a pain in the neck that GNU tar complains about files > changing underneath it --- I've looked for a way to disable that, or at > least reduce it to a warning instead of an error condition, but gtar > doesn't seem to have such a switch. You should try alternative backup > tools such as cpio or rsync. Or you might submit a patch to GNU tar. There's also a BSD version of tar, it's on at least FreeBSD (not sure if there's a stand-alone version avaiable). It might not suffer from the same problem, though I haven't tried it. I do have a machine I could try it with if that would help. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings
On 3/30/06, Grega Bremec <gregab@p0f.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > Rajesh Kumar Mallah wrote: > | > | OK i am posting my full script [ its not heavy programming i guess :) ] > | shall be grateful if you/someone could review it . (its well > commented i think) > | script also carries sample data. > | > | it does following > | 1. takes base backup to a destined folder by rsync > | 2. waits for .backup file to arrive in archive folder > | after pg_stop_bacup() > | 3. searches and removes unwanted archived log files. > | > | I have run it many times in my server and it seems to > | be working fine. > > Hello, Rajesh. > > Just a couple of comments on the script itself, not what it actually > does - I never tried WAL archiving before, so I can't comment on that. I > inserted the comments at relevant points in the script. I'm sorting them > into three categories, one is just improvements in style, the other is > optimization and the third is correction of an error. Dear Grega , Thanks for the useful tips and error spotting, i am incorporating some of them and testing the script in my server . I have concerns regarding some of your optimisations that makes the script less generic , below find my comments. > > | ------------------------------------ BEGIN > | -------------------------------------------- > | #!/bin/bash > | > | # folder where base_backup is put > | BACKUPFOLDER=/mnt/disk3/base_backups > | today=`date +%d-%m-%Y-%H-%M-%S` > | PSQL=/opt/usr/local/pgsql/bin/psql > | RSYNC="/usr/bin/rsync -a" > | PGDATADIR=/mnt/disk5/pgdatadir > | > | # two table spaces. > | > | TS1=/mnt/disk4/bigtables > | TS2=/mnt/disk3/indexspace > > (optimization) Since you're using bash, you can use arrays. This could > be better written as > > ~ TS[0]=/mnt/disk5/pgdatadir > ~ TS[1]=/mnt/disk4/bigtables > ~ TS[2]=/mnt/disk3/indexspace > > or even > > ~ TS=(/mnt/disk5/pgdatadir \ > ~ /mnt/disk4/bigtables \ > ~ /mnt/disk3/indexspace) agreed , already incorporated. > > That way, you can add tablespaces at will and just use a while loop to > back them up, which greatly simplifies adding new tablespaces or moving > the script somewhere else. See below for how to implement that. > > | # folder where *archived* logs are put. > | WAL_ARCHIVE=/mnt/wal_archive > | > | label=base_backup_${today} > | > | echo "Executing pg_start_backup with label $label in server ... " > | > | # get the checkpoint at which backup starts > | # the .backup files seems to be bearing this string in it. > | > | CP=`$PSQL -q -Upostgres -d template1 -c "SELECT > | pg_start_backup('$label');" -P tuples_only -P format=unaligned` > | > | echo "Begin CheckPoint is $CP" # this contain string like A/681D1214 > | > | if [ $? -ne 0 ] > | then > | echo "PSQL pg_start_backup failed" > | exit 1; > | fi > | echo "pg_start_backup executed successfully" > > (style) If you want to capture any error messages pg_start_backup may > have caused and store them into ${CP}, you should add 2>&1 at the end of > the psql invocation, see below snippet. Incorporated it. > > (error) Checking for exit status of pg_start_backup using $? at this > point will never report an error, as you've used echo prior to checking > what pg_start_backup returned. You should either move the echo below the > if statement (by adding an "else" clause) or store the exit status of > pg_start_backup into RVAL like this: > > ~ CP="`$PSQL ... 2>&1`" > ~ RVAL=$? > ~ echo "Begin CheckPoint says: ${CP}" > ~ if [ ${RVAL} -ne 0 ]; then > ~ ... > ~ fi > > | echo "RSYNC begins.." > | > | # rsync each of the folders to the backup folder. > | for i in $TS1 $TS2 $PGDATADIR ; > | do > | echo "Syncing $i .. " > | time $RSYNC $i $BACKUPFOLDER > | echo "Done" > | done > > (optimization) If you store locations into an array, you could rewrite > this as follows: > > ~ CTR=0 > ~ while [ -n "${TS[${CTR}]}" ]; do > ~ echo "Syncing ${TS[${CTR}]}..." > ~ time ${RSYNC} ${TS[${CTR}]} ${BACKUPFOLDER} > ~ RVAL=$? > ~ echo "Sync finished with exit status ${RVAL}" > ~ if [ ${RVAL} -ne 0 ]; then > ~ <handle errors> > ~ fi > ~ CTR=$((CTR + 1)) > ~ done > ~ unset CTR > > | # fortunately rsync does *not* seems to be exitting with non zero exit > code > | # for expected file disappearances and modifications. > | if [ $? -ne 0 ] > | then > | echo "RSYNC failed" > | exit 1; > | fi > > (error) Same error as above - what you're checking here is whether the > last command in the last for loop run was successful, and this is always > going to be true as echoing to stdout will never fail until stdout is > closed for some reason. Yes it was an error, i am doing repeat runs of the script to find the non zero exit codes which should be treated as normal in context of taking base backups. > > | echo "RSYNC Done successfully" > | > | echo "Executing pg_stop_backup in server ... " > | $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();" > | if [ $? -ne 0 ] > | then > | echo "PSQL pg_stop_backup failed" > | exit 1; > | fi > | echo "pg_stop_backup done successfully I think i should improve the style here. > | TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214 > | > | # now remove the unneeded files. > | > | # strip off first 4 chars from CP and prefix 00 to the result. > | # search the file that has the Checkpoint in its filename. > | # it takes a while to come, so wait till it comes. > | > | while true; do > | REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH` > | if [ ! $REF_FILE ]; then > | echo "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE > ... " > | else > | break > | fi > | sleep 1 > | done > > (optimization) You could simplify this significantly using the test builtin: > > ~ while [ ! -e ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2 ]; do > ~ echo "Waiting for ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2" > ~ sleep 1 > ~ done > ~ REF_FILE="`echo ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2`" I would not like to incorporate this becuase this code assumes WAL archives are being externally compressed to .bz2. Can you suggest an optimized but generic alternative ? > > | # REF_FILE is 000000010000000A00000068.001D1214.backup.bz2 > | > | # take only first 24 chars and store. > | REF_FILE_NUM=${REF_FILE:0:24} > | > | # REF_FILE_NUM is 000000010000000A00000068 > | > | echo "REF_FILE_NUM=$REF_FILE_NUM" > | > | # iterate list of files in the WAL_ARCHIVE folder > | for i in `ls -1 $WAL_ARCHIVE` ; > | do > | # $i is :000000010000000A0000005D.bz2 eg > | # get first 24 chars in filename > | FILE_NUM=${i:0:24} > | > | # compare if the number is less than the reference > | # here string comparison is being used. > | if [[ $FILE_NUM < $REF_FILE_NUM ]] > | then > | echo "$FILE_NUM [ $i ] removed" > | rm -f $WAL_ARCHIVE/$i > | else > | echo "$FILE_NUM [ $i ] not removed" > | fi > | done > > (optimization) Perhaps using find -newer/-anewer/-cnewer could be used > here to find files older than the reference file: > > ~ # "-not -newer" or "\! -newer" will also return REF_FILE > ~ # so you have to grep it out and use xargs; otherwise you > ~ # could also use the -delete action > ~ find ${WAL_ARCHIVE} \! -newer ${REF_FILE} -type f | \ > ~ grep -v "^${REF_FILE}$" | \ > ~ xargs rm -f Nopes , i have gone by the DOCS which tells to numerically compare the filenames i do not want to assume more recently created files are numerically more that later as i have not seen anything like that in the docs. I am concerned if the comparison below used in the script > | if [[ $FILE_NUM < $REF_FILE_NUM ]] is correct , as it compares strings not numbers , i am assuming that the results will be same as numerical comparison, as all the filenames are padded with '0' form the left. i thank you once again for your comments and shall post the improved version once my observations are complete. Regds Rajesh Kumar Mallah.
On 3/31/06, Sriram Dandapani <sdandapani@counterpane.com> wrote: > We have high volume inserts happening with wal files being generated > very rapidly. Will the fact that the files change as the tar happens > have any effect at all on the recovery..assuming all the appropriate wal > archive logs are available. Its absolutely safe. The documentation specifically and sufficiently dispels this doubt[ Refer Online Backup ] . All the changes to the data pages (tables/indexes) are logged to WAL before the data files are modified so the WAL replay during recovery changes the files again in the same manner. As long as wal is archived properly/securly there should not be any doubt. Regds Rajesh Kumar Mallah. > > -----Original Message----- > From: pgsql-admin-owner@postgresql.org > [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby > Sent: Friday, March 31, 2006 10:11 AM > To: Tom Lane > Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah'; > pgsql-admin@postgresql.org > Subject: Re: FW: [ADMIN] Setting up of PITR system. > > On Wed, Mar 29, 2006 at 09:46:30AM -0500, Tom Lane wrote: > > It definitely is a pain in the neck that GNU tar complains about files > > changing underneath it --- I've looked for a way to disable that, or > at > > least reduce it to a warning instead of an error condition, but gtar > > doesn't seem to have such a switch. You should try alternative backup > > tools such as cpio or rsync. > > Or you might submit a patch to GNU tar. There's also a BSD version of > tar, it's on at least FreeBSD (not sure if there's a stand-alone version > avaiable). It might not suffer from the same problem, though I haven't > tried it. I do have a machine I could try it with if that would help. > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Rajesh Kumar Mallah wrote: | Dear Grega , | | Thanks for the useful tips and error spotting, | i am incorporating some of them and testing | the script in my server . I have concerns regarding | some of your optimisations that makes the script less | generic , below find my comments. | Hello, Rajesh, I'm glad you found some use to the comments :). |>| |>| echo "Executing pg_stop_backup in server ... " |>| $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();" |>| if [ $? -ne 0 ] |>| then |>| echo "PSQL pg_stop_backup failed" |>| exit 1; |>| fi |>| echo "pg_stop_backup done successfully | | I think i should improve the style here. | Basically, since you're not storing anything into a variable, as was the case above, anything psql says will be written to stdout/stderr, whichever it is that psql writes it to, so the only "nasty" thing that could happen is that line buffering could cause the error message to appear a line or two lower than where it actually happened. If you want to capture psql output and manipulate it (for example, indenting it to make it more readable using sed 's/^.*$/ &/' or similar), sure - you can assign any output psql emits to a variable you did with pg_start_backup and redirect stderr to stdout (2>&1). That way, you can decide if the text is an error by checking $? and act accordingly: ~ SB="`${PSQL} ... 2>&1 | sed 's/^.*$/ &/'`" ~ if [ $? -ne 0 ]; then ~ echo "psql pg_stop_backup failed:" ~ echo "${SB}" ~ exit 1 ~ fi |> |>(optimization) You could simplify this significantly using the test builtin: |> |>~ while [ ! -e ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2 ]; do |>~ echo "Waiting for ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2" |>~ sleep 1 |>~ done |>~ REF_FILE="`echo ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2`" | | | I would not like to incorporate this becuase this code assumes WAL archives | are being externally compressed to .bz2. Can you suggest an optimized but | generic alternative ? | I understand your concern. I was a bit surprised you checked for .bz2 files in your original script, but I suppose that depends on WAL archive method in use, and I didn't want to make any wild guesses there. :) Assuming all the WAL files you deal with will either be compressed already or not compressed at all (that is, you are not going to catch any of them in the middle of compression taking place), you can simply replace ".bz2" in the above tests with a glob, "*". Also, you can define a ${MASK} variable at one place and just reuse that, if you find there are different archive methods on different hosts and want to keep the script configurable. ~ WAL_MASK="${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup*" ~ ... ~ echo -n "Waiting for files masked \"${WAL_MASK}\" to appear..." ~ while [ ! -e ${WAL_MASK} ]; do ~ echo -n "." ~ sleep 1 ~ done ~ echo " done." ~ REF_FILE="`echo ${WAL_MASK}`" This does the basic trick. If there are any spaces in directory names, make sure you store IFS into some variable, reset it to ^M (newline) and reassign the old value after the above loop: ~ oldIFS="${IFS}" ~ IFS=' ~ ' ~ while ...; do ~ ... ~ done ~ IFS="${oldIFS}" ~ unset oldIFS All this gets slightly more complicated if you drop the above assumption, and expect some files to be caught in the middle of the compression process, which means there will be more than one file matching the WAL_MASK: ~ 000000010000000A00000068.001D1214.backup (the original) ~ 000000010000000A00000068.001D1214.backup.bz2 (partially compressed) In this case, the test should be done using a different glob expansion method in the while loop and using find and wc to count the number of files found, then wait until there's only one file matching the mask left: ~ WAL_MASK="*.00${TO_SEARCH}.backup*" ~ WAL_FILE="${WAL_ARCHIVE}/${WAL_MASK}" ~ ... ~ echo -n "Waiting for files masked \"${WAL_MASK}\" to appear..." ~ # Until at least one file is found, the strings will both be equal. ~ # The echo will safely expand to more than one file, whereas the old ~ # test would complain a unary operator was expected. ~ while [ "${WAL_FILE}" = "`echo ${WAL_FILE}`" ] || ~ [ `find ${WAL_ARCHIVE} -name "${WAL_MASK}" | wc -l` -gt 1 ]; do ~ # Are we in the middle of compression? Let user know some ~ # progress is taking place. ~ if [ `find ${WAL_ARCHIVE} -name "${WAL_MASK}" | wc -l` -gt 1 ]; then ~ echo -n "c" # for "compressing" :) ~ else ~ echo -n "." ~ fi ~ sleep 1 ~ done ~ echo " done." ~ REF_FILE="`echo ${WAL_MASK}`" The last step could be to protect yourself from waiting indefinitely just in case something goes wrong (i.e. running out of disk space or kernel OOM killing the compression program) by defining a maximum amount of seconds you're willing to wait for a WAL: ~ MAX_SEC=600 # 10 minutes max ~ NSEC=0 ~ while ...; do ~ ... ~ sleep 1 ~ NSEC=$((NSEC + 1)) ~ if [ ${NSEC} -gt ${MAX_SEC} ]; then ~ echo "error: timeout while waiting for WAL." ~ break ~ fi ~ done ~ if [ ${NSEC} -le ${MAX_SEC} ]; then ~ echo "done." ~ REF_FILE="`echo ${WAL_MASK}`" ~ else ~ echo "Bailing out (or something else appropriate)." ~ exit 1 ~ fi ~ unset NSEC Feel free to use whatever is appropriate for your environment, of course; maybe implementing some of the above is simply more trouble than it's worth, but I included it anyway, for the sake of completeness. |>(optimization) Perhaps using find -newer/-anewer/-cnewer could be used |>here to find files older than the reference file: |> |>~ # "-not -newer" or "\! -newer" will also return REF_FILE |>~ # so you have to grep it out and use xargs; otherwise you |>~ # could also use the -delete action |>~ find ${WAL_ARCHIVE} \! -newer ${REF_FILE} -type f | \ |>~ grep -v "^${REF_FILE}$" | \ |>~ xargs rm -f | | Nopes , i have gone by the DOCS which tells to numerically compare the filenames | i do not want to assume more recently created files are numerically more that | later as i have not seen anything like that in the docs. I see. I've been skipping classes, it seems. :) | I am concerned if the comparison below used in the script | |>| if [[ $FILE_NUM < $REF_FILE_NUM ]] | | is correct , as it compares strings not numbers , i am assuming that | the results will be same as numerical comparison, as all the filenames | are padded with '0' form the left. This is definitely true in the "C" locale, where "9" comes before "A" and "Z" comes before "a". If you want to make sure it is so, run the script with LC_ALL set to "C" like this: ~ $ env LC_ALL="C" backup_script.sh However, you also have the option of using "sort -g", which compares according to general numeric value: ~ RM_LIST="" ~ find ${WAL_ARCHIVE} -type f | sort -g | while read archive; do ~ if [ ! "${archive}" = "${REF_FILE}" ]; then ~ RM_LIST="${RM_LIST:+${RM_LIST} }${archive}" ~ else ~ break ~ fi ~ done ~ rm -f ${RM_LIST} If you use this method, make sure your bash is newer than 2.05, and make it significantly newer just to be on the safe side, because there is a bug in old bash2 versions that will fail to propagate variables inside the loop back to the parent shell ("while" reading from a pipe executes in a subshell in those versions). Good luck with the script! :) Kind regards, - -- ~ Grega Bremec ~ gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFELhlEfu4IwuB3+XoRA/X7AJ9N9luGEuLMlz+eUTeWHh86X0CnkwCbB0Wa WzjVwnefz5j4a1vwWdR5UzE= =FXy8 -----END PGP SIGNATURE-----
Are you talking about tar-ing your completed WAL logs, or the file-system database directory? 1. If you're talking about tar-ing archived WAL logs - the logs you're archiving shouldn't be changing, as they should be archived outside of your PG data directory structure (with the use of archive_command) - perhaps your archive command should move the files out of the pg_xlog directory BEFORE tar-ing. 2. If you're talking about logs changing while you're tar-ing your PG data directory, you shouldn't be including your pg_xlog directory in the archive, as long as you've got all the WAL logs available AFTER the dump is done, then it'll fix any inconsistencies in the data when you do the restore. Hope this helps Andy -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Sriram Dandapani Sent: Friday, 31 March, 2006 7:19 pm To: Jim C. Nasby; Tom Lane Cc: andy.shellam@mailnetwork.co.uk; Rajesh Kumar Mallah; pgsql-admin@postgresql.org Subject: Re: FW: [ADMIN] Setting up of PITR system. We have high volume inserts happening with wal files being generated very rapidly. Will the fact that the files change as the tar happens have any effect at all on the recovery..assuming all the appropriate wal archive logs are available. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby Sent: Friday, March 31, 2006 10:11 AM To: Tom Lane Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah'; pgsql-admin@postgresql.org Subject: Re: FW: [ADMIN] Setting up of PITR system. On Wed, Mar 29, 2006 at 09:46:30AM -0500, Tom Lane wrote: > It definitely is a pain in the neck that GNU tar complains about files > changing underneath it --- I've looked for a way to disable that, or at > least reduce it to a warning instead of an error condition, but gtar > doesn't seem to have such a switch. You should try alternative backup > tools such as cpio or rsync. Or you might submit a patch to GNU tar. There's also a BSD version of tar, it's on at least FreeBSD (not sure if there's a stand-alone version avaiable). It might not suffer from the same problem, though I haven't tried it. I do have a machine I could try it with if that would help. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match !DSPAM:14,442d72c535048270212230!
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Grega Bremec wrote: | ~ find ${WAL_ARCHIVE} -type f | sort -g | while read archive; do | ~ if [ ! "${archive}" = "${REF_FILE}" ]; then Actually, this test is only possible since ${REF_FILE} was expanded using the ${WAL_FILE} mask and will contain full path to the log; if you used your old method to obtain ${REF_FILE}, it will NOT match here, as it only contains the filename. You should write the test like this: ~ if [ ! "`basename ${archive}`" = "${REF_FILE}" ]; then ~ ... Kind regards, - -- ~ Grega Bremec ~ gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFELlmqfu4IwuB3+XoRA7PNAJ9Ks5+6l0dmAgeIixcofNTUjXy3FgCfT+D1 kNvLWQIlKuo+GzF05IfPu2U= =aiNg -----END PGP SIGNATURE-----
On 4/1/06, Grega Bremec <gregab@p0f.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > Rajesh Kumar Mallah wrote: > | Dear Grega , > | > | Thanks for the useful tips and error spotting, > | i am incorporating some of them and testing > | the script in my server . I have concerns regarding > | some of your optimisations that makes the script less > | generic , below find my comments. > | > > Hello, Rajesh, > > I'm glad you found some use to the comments :). Hmm i am glad , i have found one person whom i can ask any bash doubt ;-) i am discussing only the differentials and not posting the flab. As described in docs pg_start_backup() puts the file backup_label in the datadir which can has the info about the start log file. Instead of taking a round about method i am using the content of the file, (This was also suggested by Andy at somepoint) below is the code . # read the backup_label file in pgdatadir and get the name of start wal file # below is example content. ############################################################## #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A) #CHECKPOINT LOCATION: E/A92939C #START TIME: 2006-04-01 14:36:48 IST #LABEL: base_backup_01-04-2006-14-36-45 ############################################################### BACKUP_LABEL=$DATADIR/backup_label # get the like containing line START WAL LOCATION START_LINE=`grep -i "START WAL LOCATION" $BACKUP_LABEL` # strip something like 'START WAL LOCATION: E/A9145E4 (file ' from begin. START_LINE=${START_LINE/#START*file /} # strip ')' from end. START_LINE=${START_LINE/%)/} # REF_FILE_NUM is something like 000000010000000A00000068 REF_FILE_NUM=$START_LINE ------------------ End of relevent portion --------------------------------------- please optimize the above regex capturing process if possible. lately i feel that we should not be discussing the topic over here becoz it has less to do with postgresql and more of bash. ~ RM_LIST="" ~ find ${WAL_ARCHIVE} -type f | sort -g | while read archive; do ~ if [ ! "${archive}" = "${REF_FILE}" ]; then i think you meant < instead of '=' in above line. ~ RM_LIST="${RM_LIST:+${RM_LIST} }${archive}" ~ else ~ break ~ fi ~ done ~ rm -f ${RM_LIST} last doubt: regarding $ env LC_ALL="C" backup_script.sh can i do export LC_ALL="C" or LC_ALL="C" inside the script itself to be sure ? Thanks for you help once again. Regds Rajesh Kumar Mallah.
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Rajesh Kumar Mallah wrote: | | Instead of taking a round about method i am using the content of the | file, (This was also suggested by Andy at somepoint) After reading the docs again, that's what I would do as well, indeed. :) | lately i feel that we should not be discussing the topic over here | becoz it has less to do with postgresql and more of bash. I've been considering that seriously in the very first post I wrote, but since there seems to be a lot of people interested in a working, flexible WAL archiving script, I decided to keep it on the list. It is an administration issue, afterall. I will stand corrected if someone feels we're clogging their mailboxes. | ############################################################## | #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A) | #CHECKPOINT LOCATION: E/A92939C | #START TIME: 2006-04-01 14:36:48 IST | #LABEL: base_backup_01-04-2006-14-36-45 | ############################################################### | | BACKUP_LABEL=$DATADIR/backup_label | # get the like containing line START WAL LOCATION | | START_LINE=`grep -i "START WAL LOCATION" $BACKUP_LABEL` | # strip something like 'START WAL LOCATION: E/A9145E4 (file ' from begin. | START_LINE=${START_LINE/#START*file /} | # strip ')' from end. | START_LINE=${START_LINE/%)/} | # REF_FILE_NUM is something like 000000010000000A00000068 | REF_FILE_NUM=$START_LINE Why not go for the entire filename? Record offset is never going to be more than eight characters, as include/access/xlogdefs.h states: ~ typedef struct XLogRecPtr ~ { ~ uint32 xlogid; /* log file #, 0 based */ ~ uint32 xrecoff; /* byte offset of location in log file */ ~ } XLogRecPtr; A 32 bit unsigned integer can always be represented in eight hexadecimal digits or less. ~ REF_FILE="`grep 'START WAL LOCATION' ${BACKUP_LABEL} | \ ~ awk '{ ~ sub(/)/, "", $6); ~ sub(/[0-9A-F]\//, "", $4); ~ printf("%s.%08s.backup", $6, $4); ~ }'`" This will remove the trailing paren from WAL filename (field 6), the leading xlogid and the slash from WAL location (field 4) and compose them into the full filename, zero-padding WAL location to eight characters and giving back something like this: ~ 000000010000000E0000000A.0A9145E4.backup What you need to do now is just appendd a glob (if your archive_method consists of gzip/bzip2/...) and prepend ${WAL_ARCHIVE}: ~ REF_FILE="${WAL_ARCHIVE}/${REF_FILE}*" | ~ RM_LIST="" | ~ find ${WAL_ARCHIVE} -type f | sort -g | while read archive; do | ~ if [ ! "${archive}" = "${REF_FILE}" ]; then | ~ RM_LIST="${RM_LIST:+${RM_LIST} }${archive}" | ~ else | ~ break | ~ fi | ~ done | ~ rm -f ${RM_LIST} | i think you meant < instead of '=' in above [comparison]. Absolutely not. :) What we're doing here is we're looking at all files in ${WAL_ARCHIVE} (find), sorting them according to their general numeric value (sort, lowest first) and adding them one-by-one to the list of WALs to remove (RM_LIST assignment) until we find REF_FILE (the "equals not" comparison). As soon as we find REF_FILE, we escape the while loop (break) and remove all the old log files (rm -f). Since WALs are numbered in a sequence, and location identifiers in a WAL which are also a part of the filename are sequential too, sorting will always produce a list of WAL segments in chronological order, oldest first, newest last. What is critical to the above piece of code is that BOTH ${archive} and ${REF_FILE} are either absolute filenames or relative ones, of course, otherwise they'll never match. | regarding | $ env LC_ALL="C" backup_script.sh If you do it inside the script, you shoud definitely export it to subshells since all the backtick commands execute in a subshell. Using "sort -g" to sort the listing according to general numeric value is the safest option though, and it is also the least disruptive one as it doesn't require any changes to the environment. Kind regards, - -- ~ Grega Bremec ~ gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFEL9x/fu4IwuB3+XoRA3IgAJ9Qn7dYsNhv3e9f+P64mJoiz+s77gCeLELY 4xAxFb3Ncd8RHWkBbgyag7U= =7MXQ -----END PGP SIGNATURE-----
On 4/2/06, Grega Bremec <gregab@p0f.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > Rajesh Kumar Mallah wrote: > | > | Instead of taking a round about method i am using the content of the > | file, (This was also suggested by Andy at somepoint) > > After reading the docs again, that's what I would do as well, indeed. :) > > | lately i feel that we should not be discussing the topic over here > | becoz it has less to do with postgresql and more of bash. > > I've been considering that seriously in the very first post I wrote, but > since there seems to be a lot of people interested in a working, > flexible WAL archiving script, I decided to keep it on the list. It is > an administration issue, afterall. I will stand corrected if someone > feels we're clogging their mailboxes. > > | ############################################################## > | #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A) > | #CHECKPOINT LOCATION: E/A92939C > | #START TIME: 2006-04-01 14:36:48 IST > | #LABEL: base_backup_01-04-2006-14-36-45 > | ############################################################### > | > | BACKUP_LABEL=$DATADIR/backup_label > | # get the like containing line START WAL LOCATION > | > | START_LINE=`grep -i "START WAL LOCATION" $BACKUP_LABEL` > | # strip something like 'START WAL LOCATION: E/A9145E4 (file ' from begin. > | START_LINE=${START_LINE/#START*file /} > | # strip ')' from end. > | START_LINE=${START_LINE/%)/} > | # REF_FILE_NUM is something like 000000010000000A00000068 > | REF_FILE_NUM=$START_LINE > > Why not go for the entire filename? it takes a while (i dont know how much) for the .backup file to get archived and appear in the wal archive area. thats why i prefer to use the wal log filename (000000010000000A00000068) instead of something like 000000010000000A00000068.0A348A45.backup. Do you see any problem in the current approach ? i have seen it working fine till now. Another area i was thinking to improve this script was to make it dig out all the tablespace folders to be archived by looking into PGDATADIR/pg_tblspc . This shall make the script more generic. pg_tblspc as contents like below: $ ls -l total 0 lrwxrwxrwx 1 postgres postgres 19 Mar 27 21:45 16391 -> /mnt/indexspace_new lrwxrwxrwx 1 postgres postgres 18 Mar 27 21:45 16392 -> /mnt/bigtables_new can you suggest the sane/recommended way to get the destination folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf "%???????" ) or do i parse output of ls ! if above is done i see the script INPUT/OUTPUT'ACTIVITY as below INPUTS: PGDATADIR , WAL_ARCHIVE_DIR , LOCAL DUMP DIRECTORY OUTPUT: n/a ACTIVITY: 1. it shall take a base backup of PGDATADIR (minus pg_xlog) and all tablespaces into *LOCAL* DUMP Directory 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR i am waiting for my another machine to get fitted with 2 more drives so that i can test/develop scripts for the restoration part and modify the script for remote base backups and archiving. that shall happen in a week or so. thanks for your support till now. Regds Rajesh Kumar Mallah. > > Kind regards, > - -- > ~ Grega Bremec > ~ gregab at p0f dot net > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.0 (GNU/Linux) > > iD8DBQFEL9x/fu4IwuB3+XoRA3IgAJ9Qn7dYsNhv3e9f+P64mJoiz+s77gCeLELY > 4xAxFb3Ncd8RHWkBbgyag7U= > =7MXQ > -----END PGP SIGNATURE----- >
Rajesh Kumar Mallah wrote: > Another area i was thinking to improve this script was to > make it dig out all the tablespace folders to be archived by looking > into PGDATADIR/pg_tblspc . This shall make the script more generic. > > pg_tblspc as contents like below: > > $ ls -l > total 0 > lrwxrwxrwx 1 postgres postgres 19 Mar 27 21:45 16391 -> /mnt/indexspace_new > lrwxrwxrwx 1 postgres postgres 18 Mar 27 21:45 16392 -> /mnt/bigtables_new > > can you suggest the sane/recommended way to get the destination > folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf "%???????" ) > or do i parse output of ls ! Try the readlink utility (part of GNU coreutils, not sure about other Unixen). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Whenever you get a finalized script including all the enhancements you've been talking about, I would really love to have a copy as I'm sure lots of people here would. I think this is something that should be included in the standard distribution. Can I make a couple of suggestions? 1. Include a mail option to send the admin an email when the backup succeeds or fails. 2. Call pg_stop_backup() if the script fails for whatever reason. You can't run the script twice unless you stop the backup. I ran the script a few times to see if I can get a PITR backup system going and due to directory permissions and such and during basic testing, I had to manually call pg_stop_backup() after the script exits abnormally. Thanks very much for building this script and sharing it with us. Many of us don't have the skills to write such a script. I'm a Java programmer, but I have little bash scripting skills, so this is very much appreciated. ____________________________________________________________________ Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 2, 2006, at 10:14 AM, Rajesh Kumar Mallah wrote: > On 4/2/06, Grega Bremec <gregab@p0f.net> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: RIPEMD160 >> >> Rajesh Kumar Mallah wrote: >> | >> | Instead of taking a round about method i am using the content of >> the >> | file, (This was also suggested by Andy at somepoint) >> >> After reading the docs again, that's what I would do as well, >> indeed. :) >> >> | lately i feel that we should not be discussing the topic over here >> | becoz it has less to do with postgresql and more of bash. >> >> I've been considering that seriously in the very first post I >> wrote, but >> since there seems to be a lot of people interested in a working, >> flexible WAL archiving script, I decided to keep it on the list. >> It is >> an administration issue, afterall. I will stand corrected if someone >> feels we're clogging their mailboxes. >> >> | ############################################################## >> | #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A) >> | #CHECKPOINT LOCATION: E/A92939C >> | #START TIME: 2006-04-01 14:36:48 IST >> | #LABEL: base_backup_01-04-2006-14-36-45 >> | ############################################################### >> | >> | BACKUP_LABEL=$DATADIR/backup_label >> | # get the like containing line START WAL LOCATION >> | >> | START_LINE=`grep -i "START WAL LOCATION" $BACKUP_LABEL` >> | # strip something like 'START WAL LOCATION: E/A9145E4 (file ' >> from begin. >> | START_LINE=${START_LINE/#START*file /} >> | # strip ')' from end. >> | START_LINE=${START_LINE/%)/} >> | # REF_FILE_NUM is something like 000000010000000A00000068 >> | REF_FILE_NUM=$START_LINE >> >> Why not go for the entire filename? > > > it takes a while (i dont know how much) for the .backup file to get > archived and appear in the wal archive area. thats why i prefer to > use the wal log filename (000000010000000A00000068) instead of > something like 000000010000000A00000068.0A348A45.backup. > > Do you see any problem in the current approach ? > i have seen it working fine till now. > > Another area i was thinking to improve this script was to > make it dig out all the tablespace folders to be archived by looking > into PGDATADIR/pg_tblspc . This shall make the script more generic. > > pg_tblspc as contents like below: > > $ ls -l > total 0 > lrwxrwxrwx 1 postgres postgres 19 Mar 27 21:45 16391 -> /mnt/ > indexspace_new > lrwxrwxrwx 1 postgres postgres 18 Mar 27 21:45 16392 -> /mnt/ > bigtables_new > > can you suggest the sane/recommended way to get the destination > folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf > "%???????" ) > or do i parse output of ls ! > > if above is done i see the script INPUT/OUTPUT'ACTIVITY as below > > INPUTS: PGDATADIR , WAL_ARCHIVE_DIR , LOCAL DUMP DIRECTORY > OUTPUT: n/a > ACTIVITY: > > 1. it shall take a base backup of PGDATADIR (minus pg_xlog) and > all tablespaces into *LOCAL* DUMP Directory > 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR > > i am waiting for my another machine to get fitted with 2 more drives > so that i can test/develop scripts for the restoration part and modify > the script for remote base backups and archiving. that shall > happen in a week or so. > > thanks for your support till now. > > Regds > Rajesh Kumar Mallah. > > >> >> Kind regards, >> - -- >> ~ Grega Bremec >> ~ gregab at p0f dot net >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.0 (GNU/Linux) >> >> iD8DBQFEL9x/fu4IwuB3+XoRA3IgAJ9Qn7dYsNhv3e9f+P64mJoiz+s77gCeLELY >> 4xAxFb3Ncd8RHWkBbgyag7U= >> =7MXQ >> -----END PGP SIGNATURE----- >> > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Вложения
On 4/3/06, Brendan Duddridge <brendan@clickspace.com> wrote: > Whenever you get a finalized script including all the enhancements > you've been talking about, > I would really love to have a copy as I'm sure lots of people here > would. I think this is > something that should be included in the standard distribution. > > Can I make a couple of suggestions? > > 1. Include a mail option to send the admin an email when the backup > succeeds or fails. Considering the fact that the script is hugely unix oriented currently, such a facility is anyway avialble in crontab using MAILTO variable. > 2. Call pg_stop_backup() if the script fails for whatever reason. You > can't run the script > twice unless you stop the backup. I ran the script a few times to > see if I can get a > PITR backup system going and due to directory permissions and > such and during basic > testing, I had to manually call pg_stop_backup() after the script > exits abnormally. yes this is a problem , i do not know if calling pg_stop_backup() is the end to the damage control steps, but i shall surely add it for the time being. > > Thanks very much for building this script and sharing it with us. > Many of us don't have the skills to > write such a script. I'm a Java programmer, but I have little bash > scripting skills, so this > is very much appreciated. Thanks to prying eyes of the bash gurus who are supervising it :) i only have the machines and situation to run it and seek suggestions. Regds Rajesh Kumar Mallah. > > ____________________________________________________________________ > Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com > > ClickSpace Interactive Inc. > Suite L100, 239 - 10th Ave. SE > Calgary, AB T2G 0V9 > > http://www.clickspace.com > > On Apr 2, 2006, at 10:14 AM, Rajesh Kumar Mallah wrote: > > > On 4/2/06, Grega Bremec <gregab@p0f.net> wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: RIPEMD160 > >> > >> Rajesh Kumar Mallah wrote: > >> | > >> | Instead of taking a round about method i am using the content of > >> the > >> | file, (This was also suggested by Andy at somepoint) > >> > >> After reading the docs again, that's what I would do as well, > >> indeed. :) > >> > >> | lately i feel that we should not be discussing the topic over here > >> | becoz it has less to do with postgresql and more of bash. > >> > >> I've been considering that seriously in the very first post I > >> wrote, but > >> since there seems to be a lot of people interested in a working, > >> flexible WAL archiving script, I decided to keep it on the list. > >> It is > >> an administration issue, afterall. I will stand corrected if someone > >> feels we're clogging their mailboxes. > >> > >> | ############################################################## > >> | #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A) > >> | #CHECKPOINT LOCATION: E/A92939C > >> | #START TIME: 2006-04-01 14:36:48 IST > >> | #LABEL: base_backup_01-04-2006-14-36-45 > >> | ############################################################### > >> | > >> | BACKUP_LABEL=$DATADIR/backup_label > >> | # get the like containing line START WAL LOCATION > >> | > >> | START_LINE=`grep -i "START WAL LOCATION" $BACKUP_LABEL` > >> | # strip something like 'START WAL LOCATION: E/A9145E4 (file ' > >> from begin. > >> | START_LINE=${START_LINE/#START*file /} > >> | # strip ')' from end. > >> | START_LINE=${START_LINE/%)/} > >> | # REF_FILE_NUM is something like 000000010000000A00000068 > >> | REF_FILE_NUM=$START_LINE > >> > >> Why not go for the entire filename? > > > > > > it takes a while (i dont know how much) for the .backup file to get > > archived and appear in the wal archive area. thats why i prefer to > > use the wal log filename (000000010000000A00000068) instead of > > something like 000000010000000A00000068.0A348A45.backup. > > > > Do you see any problem in the current approach ? > > i have seen it working fine till now. > > > > Another area i was thinking to improve this script was to > > make it dig out all the tablespace folders to be archived by looking > > into PGDATADIR/pg_tblspc . This shall make the script more generic. > > > > pg_tblspc as contents like below: > > > > $ ls -l > > total 0 > > lrwxrwxrwx 1 postgres postgres 19 Mar 27 21:45 16391 -> /mnt/ > > indexspace_new > > lrwxrwxrwx 1 postgres postgres 18 Mar 27 21:45 16392 -> /mnt/ > > bigtables_new > > > > can you suggest the sane/recommended way to get the destination > > folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf > > "%???????" ) > > or do i parse output of ls ! > > > > if above is done i see the script INPUT/OUTPUT'ACTIVITY as below > > > > INPUTS: PGDATADIR , WAL_ARCHIVE_DIR , LOCAL DUMP DIRECTORY > > OUTPUT: n/a > > ACTIVITY: > > > > 1. it shall take a base backup of PGDATADIR (minus pg_xlog) and > > all tablespaces into *LOCAL* DUMP Directory > > 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR > > > > i am waiting for my another machine to get fitted with 2 more drives > > so that i can test/develop scripts for the restoration part and modify > > the script for remote base backups and archiving. that shall > > happen in a week or so. > > > > thanks for your support till now. > > > > Regds > > Rajesh Kumar Mallah. > > > > > >> > >> Kind regards, > >> - -- > >> ~ Grega Bremec > >> ~ gregab at p0f dot net > >> -----BEGIN PGP SIGNATURE----- > >> Version: GnuPG v1.4.0 (GNU/Linux) > >> > >> iD8DBQFEL9x/fu4IwuB3+XoRA3IgAJ9Qn7dYsNhv3e9f+P64mJoiz+s77gCeLELY > >> 4xAxFb3Ncd8RHWkBbgyag7U= > >> =7MXQ > >> -----END PGP SIGNATURE----- > >> > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 5: don't forget to increase your free space map settings > > > > > >
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Rajesh Kumar Mallah wrote: | it takes a while (i dont know how much) for the .backup file to get | archived and appear in the wal archive area. thats why i prefer to | use the wal log filename (000000010000000A00000068) instead of | something like 000000010000000A00000068.0A348A45.backup. | | Do you see any problem in the current approach ? | i have seen it working fine till now. I do, to be honest. The WAL location counter accounts for 4294967295 positions and while I'm certain that's WAY more than the average number of transactions that go into a WAL, quite a number of small ones can certainly happen before a WAL is rolled over, and until then, you're dealing with the same log file. If two backups happen in that period of time for whatever reason, you're going to have a false positive by looking into ${WAL_ARCHIVE} and searching just for the WAL name, so including the location in the search of a WAL fragment is certainly necessary. Infact, going purely by chance, the probability of hitting the same location in two different log files in two subsequent backups is much lower than hitting the same WAL twice. | can you suggest the sane/recommended way to get the destination | folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf | "%???????" ) or do i parse output of ls ! Well, one excellent idea is certainly Alvaro's - the readlink utility can do that for you in no time: ~ $ find ${PGDATADIR}/pg_tblspc/ -type l -exec readlink {} \; ~ /export/works/space/zlxmon/tblspc ~ ... The other option you have is piping '\db' into psql and have it display the list of tablespaces on standard output (you can use ~/.pgpass to supress password prompting): ~ $ echo '\db' | psql -A -F: -t template1 ~ pg_default:postgres: ~ pg_global:postgres: ~ zlxmon_ts:gregab:/export/work/space/zlxmon/tblspc ~ ... By reformatting this slightly you can also get the desired result and most importantly, it doesn't rely on how postgres' objects are organized in the main database cluster directory: ~ $ echo '\db' | psql -A -F: -t template1 | cut -f3 -d: | grep -v '^$' ~ /export/work/space/zlxmon/tblspc ~ ... It's all options for you, of course. Use whichever you prefer. | if above is done i see the script INPUT/OUTPUT'ACTIVITY as below | | INPUTS: PGDATADIR , WAL_ARCHIVE_DIR , LOCAL DUMP DIRECTORY | OUTPUT: n/a | ACTIVITY: | | 1. it shall take a base backup of PGDATADIR (minus pg_xlog) and | all tablespaces into *LOCAL* DUMP Directory I think you can list that as an output. :) | 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR Perhaps moving the old log files into a father backup directory and having them stick around for a period of time before removing them isn't a bad idea either, just in case something goes wrong with your latest backup. You could go about that using find as well; see the -ctime predicate in find(1). Kind regards, - -- ~ Grega Bremec ~ gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFEMLxYfu4IwuB3+XoRA1zZAJ9y7HquxYLH5l0rYff6+cvpU+9lVACfVNG0 PPJZu8IPMBR5j/xPy1+CFDs= =G3Bt -----END PGP SIGNATURE-----
> | Do you see any problem in the current approach ? > | i have seen it working fine till now. > > I do, to be honest. The WAL location counter accounts for 4294967295 > positions and while I'm certain that's WAY more than the average number > of transactions that go into a WAL, quite a number of small ones can > certainly happen before a WAL is rolled over, and until then, you're > dealing with the same log file. > > If two backups happen in that period of time for whatever reason, you're > going to have a false positive by looking into ${WAL_ARCHIVE} and > searching just for the WAL name, so including the location in the search > of a WAL fragment is certainly necessary. Infact, going purely by > chance, the probability of hitting the same location in two different > log files in two subsequent backups is much lower than hitting the same > WAL twice. Dear Grega, sincere thanks for your time, The current wal log is not being removed from the wal archive area in any case. The files less than the current ones are being rm'ed. I am sorry i am not able to get your apprehension. But i shall surely try harder to understand your point. anyways have a look at the current script with following improvements. 1. Do some sanity checks about folder existance and permissions 2. accepts 3 mandatory args now , PGDATADIR , BACKUP DUMP FOLDER and WAL ARCHIVE AREA 3. use readlink -f to probe all the directories to be included in basebackup 4. Attempt to probe psql and rsync in system and bail out if not found. Regarding : > | 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR > > Perhaps moving the old log files into a father backup directory and > having them stick around for a period of time before removing them isn't > a bad idea either, just in case something goes wrong with your latest > backup. You could go about that using find as well; see the -ctime > predicate in find(1). the old log files without the base backup are not useful. since rsync is being used to optimise the copying by overwriting the base backup everytime, i dont thing preserving the old files makes sense. Had it been and non overwritng backup the files would have made sense. ---------------- BEGIN ------------------------------------------------- #!/bin/bash ################################################## # it does following # 1. checks existance and permission of imp folders. # 2. takes base backup to a destined folder by rsync # 3. removes unwanted archived log files. ################################################## if [ $# -ne 3 ] then echo "Usage: $0 <DATADIR> <BACKUP DIRECTORY> <WAL ARCHIVE DIRECTORY>" exit 1 fi DATADIR_IN=$1 BACKUPFOLDER=$2 WAL_ARCHIVE=$3 if [ -z $BACKUPFOLDER ] || [ ! -d $BACKUPFOLDER ] || [ ! -w $BACKUPFOLDER ] then echo "Sorry base backup folder $BACKUPFOLDER does not exists or is not writable or is not specified!" exit 1 fi if [ -z $WAL_ARCHIVE ] || [ ! -d $WAL_ARCHIVE ] || [ ! -w $WAL_ARCHIVE ] then echo "Sorry WAL archive folder $WAL_ARCHIVE does not exists or is not writable or is not specified!" exit 1 fi if [ -L $DATADIR_IN ] then DATADIR=`readlink -f $DATADIR_IN` echo "Using $DATADIR instead of $DATADIR_IN as $DATADIR_IN is a link" else DATADIR=$DATADIR_IN fi # get all tablespaces from $DATADIR/pg_tblspc DIRS=(`find $DATADIR/pg_tblspc -type l -exec readlink -f {} \;`) # append DATADIR to it DIRS=( "${DIRS[@]}" $DATADIR) CTR=0 echo "Script shall backup following folders" while [ -n "${DIRS[${CTR}]}" ]; do echo "${DIRS[${CTR}]}" CTR=$((CTR + 1)) done unset CTR PSQL_BIN=`which psql` || /usr/local/pgsql/bin/psql RSYNC_BIN=`which rsync` || /usr/bin/rsync for PROG in $PSQL_BIN $RSYNC_BIN ; do if [ ! -f $PROG ] || [ ! -x $PROG ] then echo "Sorry $PROG does not exists or is not executable by you" echo "Please set env variable PATH to include psql and rsync" exit 1 else echo "Using $PROG" fi done RSYNC_OPTS="--delete-after -a --exclude pg_xlog" RSYNC="$RSYNC_BIN $RSYNC_OPTS" PSQL=$PSQL_BIN today=`date +%d-%m-%Y-%H-%M-%S` label=base_backup_${today} echo "Executing pg_start_backup with label $label in server ... " # get the checkpoint at which backup starts # the .backup files seems to be bearing this string in it. CP=`$PSQL -q -Upostgres -d template1 -c "SELECT pg_start_backup('$label');" -P tuples_only -P format=unaligned` RVAL=$? if [ $RVAL -ne 0 ] then echo "PSQL pg_start_backup failed:$CP" exit 1; fi echo "pg_start_backup executed successfully" # read the backup_label file in pgdatadir and get the name of start wal file # below is example content. #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A) #CHECKPOINT LOCATION: E/A92939C #START TIME: 2006-04-01 14:36:48 IST #LABEL: base_backup_01-04-2006-14-36-45 BACKUP_LABEL=$DATADIR/backup_label # assuming pg_start_backup immediate puts backup_label in # pgdatadir on finish. START_LINE=`grep -i "START WAL LOCATION" $BACKUP_LABEL` # get the like containing START WAL LOCATION START_LINE=${START_LINE/#START*file /} # strip something like 'START WAL LOCATION: E/A9145E4 (file ' from begin. START_LINE=${START_LINE/%)/} # strip ')' from end. # REF_FILE_NUM is something like 000000010000000A00000068 REF_FILE_NUM=$START_LINE echo "Content of $BACKUP_LABEL" echo "------------- begin -----------" cat $BACKUP_LABEL echo "------------- end -----------" echo "Read Start Wal as : $REF_FILE_NUM" echo "RSYNC begins.." # rsync each of the folders to the backup folder. CTR=0 while [ -n "${DIRS[${CTR}]}" ]; do echo "Syncing ${DIRS[${CTR}]}..." echo "Executing:${RSYNC} ${DIRS[${CTR}]} ${BACKUPFOLDER}" time ${RSYNC} ${DIRS[${CTR}]} ${BACKUPFOLDER} RVAL=$? echo "Sync finished with exit status ${RVAL}" if [[ ${RVAL} -eq 0 || ${RVAL} -eq 23 ]]; then echo "Rsync success" else echo "Rsync failed" $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();" exit 1 fi CTR=$((CTR + 1)) done unset CTR echo "Executing pg_stop_backup in server ... " $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();" if [ $? -ne 0 ] then echo "PSQL pg_stop_backup failed" exit 1; fi echo "pg_stop_backup done successfully" echo "REF_FILE_NUM=$REF_FILE_NUM" # iterate list of files in the WAL_ARCHIVE folder for i in `ls -1 $WAL_ARCHIVE` ; do # $i is :000000010000000A0000005D.bz2 eg # get first 24 chars in filename FILE_NUM=${i:0:24} # compare if the number is less than the reference # here string comparison is being used. if [[ $FILE_NUM < $REF_FILE_NUM ]] then echo "$FILE_NUM [ $i ] removed" rm -f $WAL_ARCHIVE/$i else echo "$FILE_NUM [ $i ] not removed" fi done ------------------ END -----------------------------------------------------
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Rajesh Kumar Mallah wrote: |>| Do you see any problem in the current approach ? |>| i have seen it working fine till now. |> |>I do, to be honest. The WAL location counter accounts for 4294967295 |>positions and while I'm certain that's WAY more than the average number |>of transactions that go into a WAL, quite a number of small ones can |>certainly happen before a WAL is rolled over, and until then, you're |>dealing with the same log file. |> |>If two backups happen in that period of time for whatever reason, you're |>going to have a false positive by looking into ${WAL_ARCHIVE} and |>searching just for the WAL name, so including the location in the search |>of a WAL fragment is certainly necessary. Infact, going purely by |>chance, the probability of hitting the same location in two different |>log files in two subsequent backups is much lower than hitting the same |>WAL twice. | | The current wal log is not being removed from the wal archive area | in any case. The files less than the current ones are being rm'ed. | | I am sorry i am not able to get your apprehension. But i shall | surely try harder to understand your point. Hi Rajesh, list. I'm sorry I didn't get back to you earlier, I was at an IBM business conference for a couple of days; not to say it rendered me incapable of communicating via e-mail, but it did bring along certain social responsibilities which caused me to both stay up and sleep late, if you know what I mean. :) Let me explain the above predicament in more practical terms. Let us say you started a backup very soon after a WAL had been rolled over. Current WAL at that time was called, for example, ${PGDATA}/pg_xlog/000000010000000E0000000A. The location at that time was 000F594A (iow, early in the WAL cycle). [disclaimer: all events in this story are entirely fictional, any similarity to actual persons and events is purely coincidental :) ] pg_start_backup() will create a WAL backup: ~ ${PGDATA}/pg_xlog/000000010000000E0000000A.000F594A.backup which will be archived to ${WAL_ARCHIVE} under the same name, or possibly given a different extension, depending on archive_method. Let us assume for the purpose of this explanation, that archive_method consists only of cp -i </dev/null, although the problem would have been identical if one used gzip -c, for example. Now, this backup fails for whatever reason (rsync trouble, etc.). You abort it and leave WAL archive as it was. You diagnose the problem that caused the backup to fail and repeat the procedure. And since your diagnostic skills are so good it took you almost no time to fix it, the database engine is now at location 002D94AF in that _same_ WAL. Once you restart the backup script, pg_start_backup() is called and ${PGDATA}/pg_xlog/000000010000000E0000000A.002D94AF.backup is created and archived to ${WAL_ARCHIVE} under that same name. Your method of discovering logs to delete will now match _two_ "current" log file archives instead of one, because they both come from the same WAL, fail to actually delete the stale one (the one from position 000F594A) and thus clutter your backup with irrelevant WAL fragments. The second part of the second paragraph was only to expose that, following the same logic as outlined above, if you take WAL locations as the criterion of removing stale WAL fragments instead of WAL names, it is far less likely to hit a false positive, because you would have to pg_start_backup() _exactly_ 4294967296 locations after the first one. Of course, you want to be unambiguous in your search of the perfect WAL archive, so you want to use _both_ WAL name and location as the criterion. | the old log files without the base backup are not useful. since | rsync is being used to optimise the copying by overwriting the | base backup everytime, i dont thing preserving the old files | makes sense. Had it been and non overwritng backup the files | would have made sense. I see. I was assuming you used rsync to copy the database cluster somewhere then tar it there, while it was lying still ("Fell, Destroyed" of Fugazi comes to mind :) ). I will get back to you with the review of your script later. A quick scan reveals there is not much left to be improved, though. Kind regards, - -- ~ Grega Bremec ~ gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFEN68Vfu4IwuB3+XoRAxbjAJ49Hg6d9J0RZLywEZLr3WeD1sHevQCfSOm8 Pr3jEYb/+viTp/OAoJ6a5/M= =9e0i -----END PGP SIGNATURE-----
> Rajesh Kumar Mallah wrote: > |>| Do you see any problem in the current approach ? > |>| i have seen it working fine till now. > |> > |>I do, to be honest. The WAL location counter accounts for 4294967295 > |>positions and while I'm certain that's WAY more than the average number > |>of transactions that go into a WAL, quite a number of small ones can > |>certainly happen before a WAL is rolled over, and until then, you're > |>dealing with the same log file. > |> > |>If two backups happen in that period of time for whatever reason, you're > |>going to have a false positive by looking into ${WAL_ARCHIVE} and > |>searching just for the WAL name, so including the location in the search > |>of a WAL fragment is certainly necessary. Infact, going purely by > |>chance, the probability of hitting the same location in two different > |>log files in two subsequent backups is much lower than hitting the same > |>WAL twice. > | > | The current wal log is not being removed from the wal archive area > | in any case. The files less than the current ones are being rm'ed. > | > | I am sorry i am not able to get your apprehension. But i shall > | surely try harder to understand your point. > > Hi Rajesh, list. > > I'm sorry I didn't get back to you earlier, I was at an IBM business > conference for a couple of days; not to say it rendered me incapable of > communicating via e-mail, but it did bring along certain social > responsibilities which caused me to both stay up and sleep late, if you > know what I mean. :) > > Let me explain the above predicament in more practical terms. > > Let us say you started a backup very soon after a WAL had been rolled > over. Current WAL at that time was called, for example, > ${PGDATA}/pg_xlog/000000010000000E0000000A. The location at that time > was 000F594A (iow, early in the WAL cycle). [disclaimer: all events in > this story are entirely fictional, any similarity to actual persons and > events is purely coincidental :) ] > > pg_start_backup() will create a WAL backup: > ~ ${PGDATA}/pg_xlog/000000010000000E0000000A.000F594A.backup > > which will be archived to ${WAL_ARCHIVE} under the same name, or > possibly given a different extension, depending on archive_method. Let > us assume for the purpose of this explanation, that archive_method > consists only of cp -i </dev/null, although the problem would have been > identical if one used gzip -c, for example. > > Now, this backup fails for whatever reason (rsync trouble, etc.). You > abort it and leave WAL archive as it was. You diagnose the problem that > caused the backup to fail and repeat the procedure. And since your > diagnostic skills are so good it took you almost no time to fix it, the > database engine is now at location 002D94AF in that _same_ WAL. > > Once you restart the backup script, pg_start_backup() is called and > ${PGDATA}/pg_xlog/000000010000000E0000000A.002D94AF.backup is created > and archived to ${WAL_ARCHIVE} under that same name. > > Your method of discovering logs to delete will now match _two_ "current" > log file archives instead of one, because they both come from the same > WAL, fail to actually delete the stale one (the one from position > 000F594A) and thus clutter your backup with irrelevant WAL fragments. Dear Grega, Thanks for the reply. now i have started understanding ! Is cluttering of the wal archive area in cases where that backup had to be re-started for whatever reasons is the *only* concern ? if its so , we should not be too much bothered becoz in the next successfull backup the extra clutter will get deleted. if there are other concern please lemme know. > > The second part of the second paragraph was only to expose that, > following the same logic as outlined above, if you take WAL locations as > the criterion of removing stale WAL fragments instead of WAL names, it > is far less likely to hit a false positive, because you would have to > pg_start_backup() _exactly_ 4294967296 locations after the first one. > > Of course, you want to be unambiguous in your search of the perfect WAL > archive, so you want to use _both_ WAL name and location as the criterion. > > | the old log files without the base backup are not useful. since > | rsync is being used to optimise the copying by overwriting the > | base backup everytime, i dont thing preserving the old files > | makes sense. Had it been and non overwritng backup the files > | would have made sense. > > I see. I was assuming you used rsync to copy the database cluster > somewhere then tar it there, while it was lying still ("Fell, Destroyed" > of Fugazi comes to mind :) ). > > I will get back to you with the review of your script later. A quick > scan reveals there is not much left to be improved, though. Please do not put too much effort, as i the drives in my other server has got installed and i am adapting the script for doing remote backup ( which is a more common senerio). Thank You Regds Rajesh Kumar Mallah. >
Tom Lane wrote: > It definitely is a pain in the neck that GNU tar complains about files > changing underneath it --- I've looked for a way to disable that, or at > least reduce it to a warning instead of an error condition, but gtar > doesn't seem to have such a switch. You should try alternative backup > tools such as cpio or rsync. You mean the tar exits or that it just returns an error code on completion? -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> It definitely is a pain in the neck that GNU tar complains about files >> changing underneath it --- I've looked for a way to disable that, or at >> least reduce it to a warning instead of an error condition, but gtar >> doesn't seem to have such a switch. You should try alternative backup >> tools such as cpio or rsync. > You mean the tar exits or that it just returns an error code on > completion? I don't recall whether it finishes making the tarball, but it definitely returns nonzero exit status, which makes it effectively unusable in any automated script (you certainly don't want to ignore exit status in a backup script ...) regards, tom lane
From: pgsql-admin-owner@postgresql.org on behalf of Tom Lane
Sent: Mon 4/10/2006 7:19 PM
To: Bruce Momjian
Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah'; pgsql-admin@postgresql.org
Subject: Re: FW: [ADMIN] Setting up of PITR system.
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> It definitely is a pain in the neck that GNU tar complains about files
>> changing underneath it --- I've looked for a way to disable that, or at
>> least reduce it to a warning instead of an error condition, but gtar
>> doesn't seem to have such a switch. You should try alternative backup
>> tools such as cpio or rsync.
> You mean the tar exits or that it just returns an error code on
> completion?
I don't recall whether it finishes making the tarball, but it definitely
returns nonzero exit status, which makes it effectively unusable in any
automated script (you certainly don't want to ignore exit status in a
backup script ...)
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Hi, On Tue, 11 Apr 2006, Spiegelberg, Greg wrote: > You should implement filesystem or volume level snapshots. LVM, > Veritas and the like all have the functionality and it gets the > database out of "backup" mode quickly. from what I understand there is no need to keep the time between pg_start_backup and pg_stop_backup small. The datebase is in no way restrcited in its operation or performance during this time. The only magic seems to be the marker file containing the information which transaction were commited at the time the backup started. As we run freebsd and our tar does not complain of files changing underneath we prefer to avoid the additional complexity and of filesystem snapshots and just plain tar the data directory without the pg_xlog subdirectory. Greetings Christian -- Christian Kratzer ck@cksoft.de CK Software GmbH http://www.cksoft.de/ Phone: +49 7452 889 135 Fax: +49 7452 889 136
We already recommend file system snapshots as a backup method in our documentation. --------------------------------------------------------------------------- Spiegelberg, Greg wrote: > You should implement filesystem or volume level snapshots. LVM, Veritas and the like all have the functionality and itgets the database out of "backup" mode quickly. > > http://www.tldp.org/HOWTO/LVM-HOWTO/snapshots_backup.html > > Before the lvcreate -s command issue the pg_start_backup() and after pg_stop_backup(). Then do what you will with thesnapshot volume. Just in case a fsck may be necessary I'd recommend mounting read-only. > > Caveat emptor, I believe certain filesystems take issue with this, like xfs, however we have not seen anything unusualwith ext3. Also note that you must have some space unallocated in the proper LVM disk group for the snapshot volume. > > Greg > > ________________________________ > > From: pgsql-admin-owner@postgresql.org on behalf of Tom Lane > Sent: Mon 4/10/2006 7:19 PM > To: Bruce Momjian > Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah'; pgsql-admin@postgresql.org > Subject: Re: FW: [ADMIN] Setting up of PITR system. > > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> It definitely is a pain in the neck that GNU tar complains about files > >> changing underneath it --- I've looked for a way to disable that, or at > >> least reduce it to a warning instead of an error condition, but gtar > >> doesn't seem to have such a switch. You should try alternative backup > >> tools such as cpio or rsync. > > > You mean the tar exits or that it just returns an error code on > > completion? > > I don't recall whether it finishes making the tarball, but it definitely > returns nonzero exit status, which makes it effectively unusable in any > automated script (you certainly don't want to ignore exit status in a > backup script ...) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Rajesh Kumar Mallah wrote: > Is cluttering of the wal archive area in cases where that backup > had to be re-started for whatever reasons is the *only* concern ? Well, yes, to be honest. But it may in consequence cause problems of another kind, which I don't feel qualified to reassure you on: I am not positive about what happens if you try to replay an old WAL on a current database backup. If nothing else, it is going to make the person restoring the backup rather unnerved about success of the operation they are currently performing, which is not a good thing, IMHO. They are restoring a backup afterall, which means they'd already undergone a fair amount of stress as it is. :) End all, it is your choice to decide which is more trouble and which is worth more: fixing the script to produce clean backups or informing your backup operators about the extra care they need to take when restoring backups. > Please do not put too much effort, as i the drives in my other > server has got installed and i am adapting the script for doing > remote backup ( which is a more common senerio). Very nice! How is it going? And how are you copying the WALs? scp? rsync? Kind regards, - -- Grega Bremec gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFEPgU9fu4IwuB3+XoRA+MeAJ0dbbfcgBqP9SCYq0VICN8xrtGN0wCffE6i kq1LlDwlJwmfrOtwRBwGqFg= =olf3 -----END PGP SIGNATURE-----
On 4/13/06, Grega Bremec <gregab@p0f.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
> Rajesh Kumar Mallah wrote:
> > Is cluttering of the wal archive area in cases where that backup
> > had to be re-started for whatever reasons is the *only* concern ?
>
> Well, yes, to be honest. But it may in consequence cause problems of
> another kind, which I don't feel qualified to reassure you on: I am not
> positive about what happens if you try to replay an old WAL on a current
> database backup.
>
> If nothing else, it is going to make the person restoring the backup
> rather unnerved about success of the operation they are currently
> performing, which is not a good thing, IMHO. They are restoring a backup
> afterall, which means they'd already undergone a fair amount of stress
> as it is. :)
>
> End all, it is your choice to decide which is more trouble and which is
> worth more: fixing the script to produce clean backups or informing your
> backup operators about the extra care they need to take when restoring
> backups.
Dear Grega,
gald to see the clarification.
The concern you have is valid only if something goes wrong
after pg_start_backup() . In such case the backup admin
can get notified in advance by using the MAILTO env variable
in crontab. so its not that we are stressing him in already stressed
situation.
the reason of my aversion in incorporating you suggestion is however
different. There does not seem to be any realiable way of getting the
name of .backup file from the contents in backup_label. consider the
method you gave.
REF_FILE="`grep 'START WAL LOCATION' ${BACKUP_LABEL} | \
awk '{
sub(/)/, "", $6);
sub(/[0-9A-F]\//, "", $4);
printf("%s.%08s.backup", $6, $4);
}'`"
firstly i feel there should be a + after [0-9A-F] , when i run the command on
my data following is the outcome.
-bash-2.05b$ cat /mnt/disk2/base_backups/pgdatadir/backup_label
START WAL LOCATION: 1C/4C7E5E90 (file 000000010000001C0000004C)
CHECKPOINT LOCATION: 1C/4C824A44
START TIME: 2006-04-12 21:32:43 IST
LABEL: base_backup_12-04-2006-21-32-52
-bash-2.05b$ grep 'START WAL' /mnt/disk2/base_backups/pgdatadir/backup_label | awk '{ sub (/)/, "", $6); sub(/[0-9A-F]+\//, "", $4); printf("%s.%08s.backup\n", $6, $4 ); }'
000000010000001C0000004C.4C7E5E90.backup
-bash-2.05b$ ls -l /mnt/disk2/wal_archive/*.backup
-rw------- 1 postgres postgres 270 Apr 12 21:53 /mnt/disk2/wal_archive/000000010000001C0000004C.007E5E90.backup
was you can see the predicted name
000000010000001C0000004C.4C7E5E90.backup is not same as
000000010000001C0000004C.007E5E90.backup
one may argue that replacing first 2 charachers of 4C7E5E90 with
'0's may yeild the correct result, but it is not so , i have observed
a case which runs against this hypothesis.
therefore unless someone tells the correct method of derieving
the filename , the current approach is the best that can be done.
I really hope some guru to throw some light here. i tried digging
xlog.c and xlogutils.c without any success.
>
> > Please do not put too much effort, as i the drives in my other
> > server has got installed and i am adapting the script for doing
> > remote backup ( which is a more common senerio).
>
> Very nice! How is it going? And how are you copying the WALs? scp? rsync?
I finished it long back but i was waiting for comments from the list
regarding the concern to which you replied. I think we discuss the
new version once this issue is resolved.
Regds
Rajesh Kumar Mallah.
>
> Kind regards,
> - --
> Grega Bremec
> gregab at p0f dot net
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.0 (GNU/Linux)
>
> iD8DBQFEPgU9fu4IwuB3+XoRA+MeAJ0dbbfcgBqP9SCYq0VICN8xrtGN0wCffE6i
> kq1LlDwlJwmfrOtwRBwGqFg=
> =olf3
> -----END PGP SIGNATURE-----
>
please find below a
1. A script that takes remote base backups of a postgresql database.
it assumes a password less communication between the backup
server and the database server. After taking a base backup it removes
that un neccesary WAL log files from the archive folder. The script has been
running in my server for past few weeks without any problems. It takes
apprx 40 mins to backup around 40 GB data.
sample execution on the backup server
$ ./pg_rsync.sh 216.247.238.130 /mnt/disk2/base_backups /mnt/disk2/wal_archive
or in crontab as
30 20 * * * /usr/bin/time /home/postgres/pg_rsync.sh 216.247.238.130 /mnt/disk2/base_backups /mnt/disk2/wal_archive
Note: script also assumes that you have setup remote wal archiving using a suitable
archive_command in postgresql.conf , these scripts are *not* posted here.
any suggestions welcome.
------------------------ script begins (pg_rsync.sh)------------------------------------------------
#!/bin/bash
##################################################
# it does following
# 1. checks existance and permission of imp folders.
# 2. takes base backup to a destined folder by rsync
# 3. removes unwanted archived log files.
# Contributions: Grega Bremec (gregab@p0f.net) ,
# Alvaro Herrera (alvherre@commandprompt.com)
# Preliminary Version: Mallah (mallah.rajesh@gmail.com)
##################################################
then
echo "Usage: $0 <HOSTNAME> <BACKUP DIRECTORY> <WAL ARCHIVE DIRECTORY>"
exit 1
fi
HOSTNAME=$1
BACKUPFOLDER=$2
WAL_ARCHIVE=$3
echo "HOSTNAME=$HOSTNAME"
echo "BACKUPFOLDER=$BACKUPFOLDER"
echo "WAL_ARCHIVE=$WAL_ARCHIVE"
if [ -z $BACKUPFOLDER ] || [ ! -d $BACKUPFOLDER ] || [ ! -w $BACKUPFOLDER ]
then
echo "Sorry base backup folder $BACKUPFOLDER does not exists or is not writable or is not specified!"
exit 1
fi
if [ -z $WAL_ARCHIVE ] || [ ! -d $WAL_ARCHIVE ] || [ ! -w $WAL_ARCHIVE ]
then
echo "Sorry WAL archive folder $WAL_ARCHIVE does not exists or is not writable or is not specified!"
exit 1
fi
PSQL_BIN=`which psql` || /usr/local/pgsql/bin/psql
RSYNC_BIN=`which rsync` || /usr/bin/rsync
SSH_BIN=`which ssh` || /usr/bin/ssh
for PROG in $PSQL_BIN $RSYNC_BIN $SSH_BIN ; do
if [ ! -f $PROG ] || [ ! -x $PROG ]
then
echo "Sorry $PROG does not exists or is not executable by you"
echo "Please set env variable PATH to include psql and rsync"
exit 1
else
echo "Using $PROG"
fi
done
RSYNC_OPTS="--delete-after -a -e $SSH_BIN --exclude pg_xlog"
RSYNC="$RSYNC_BIN $RSYNC_OPTS"
PSQL=$PSQL_BIN
today=`date +%d-%m-%Y-%H-%M-%S`
label=base_backup_${today}
echo "Querying data_directory and tablespace folders from server $HOSTNAME"
DATA_DIR=`$PSQL -q -Upostgres -d template1 -h $HOSTNAME -c "show data_directory;" -P tuples_only -P format=unaligned 2>&1`
RVAL=$?
if [ $RVAL -ne 0 ]
then
echo "Some error in getting data_directory:$DATA_DIR"
exit 1;
fi
echo "DATA_DIR:$DATA_DIR"
TBL_SPCS=(`$PSQL -q -Upostgres -d template1 -h $HOSTNAME -c "SELECT spclocation from pg_catalog.pg_tablespace where length(spclocation)>0;" -P tuples_only -P format=unaligned`)
RVAL=$?
if [ $RVAL -ne 0 ]
then
echo "There is some problem in getting table spaces"
exit 1;
fi
DIRS=( "${TBL_SPCS[@]}" $DATA_DIR)
echo "Folders for Backup:"
CTR=0
while [ -n "${DIRS[${CTR}]}" ]; do
echo -n "${DIRS[${CTR}]} --> "
DIRS[${CTR}]=`ssh $HOSTNAME readlink -f ${DIRS[${CTR}]}`
echo "${DIRS[${CTR}]} (after symlink resolution)"
CTR=$((CTR + 1))
done
unset CTR
CP=`$PSQL -q -Upostgres -d template1 -h$HOSTNAME -c "SELECT pg_start_backup('$label');" -P tuples_only -P format=unaligned 2>&1`
RVAL=$?
if [ $RVAL -ne 0 ]
then
echo "PSQL pg_start_backup failed:$CP"
exit 1;
fi
echo "pg_start_backup executed successfully: $CP"
echo "RSYNC begins.."
# rsync each of the folders to the backup folder.
CTR=0
while [ -n "${DIRS[${CTR}]}" ]; do
echo "Syncing ${DIRS[${CTR}]}..."
time ${RSYNC} $HOSTNAME:${DIRS[${CTR}]} ${BACKUPFOLDER}
RVAL=$?
echo "Sync finished with exit status ${RVAL}"
if [[ ${RVAL} -eq 0 || ${RVAL} -eq 23 ]]; then
echo "Rsync success"
else
echo "Rsync failed"
$PSQL -Upostgres -h$HOSTNAME template1 -c "SELECT pg_stop_backup();"
exit 1
fi
CTR=$((CTR + 1))
done
unset CTR
echo "Executing pg_stop_backup in server ... "
$PSQL -Upostgres -h$HOSTNAME template1 -c "SELECT pg_stop_backup();"
if [ $? -ne 0 ]
then
echo "PSQL pg_stop_backup failed"
exit 1;
fi
echo "pg_stop_backup done successfully"
# read the backup_label file in pgdatadir and get the name of start wal file
# below is example content.
#START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A)
#CHECKPOINT LOCATION: E/A92939C
#START TIME: 2006-04-01 14:36:48 IST
#LABEL: base_backup_01-04-2006-14-36-45
DATA_DIR_NAME=`basename $DATA_DIR`
BACKUP_LABEL=$BACKUPFOLDER/$DATA_DIR_NAME/backup_label
echo "BACKUP_LABEL: $BACKUP_LABEL"
START_LINE=`grep -i "START WAL LOCATION" $BACKUP_LABEL` # get the like containing START WAL LOCATION
START_LINE=${START_LINE/#START*file /} # strip something like 'START WAL LOCATION: E/A9145E4 (file ' from begin.
START_LINE=${START_LINE/%)/} # strip ')' from end.
# REF_FILE_NUM is something like 000000010000000A00000068
REF_FILE_NUM=$START_LINE
echo "Content of $BACKUP_LABEL"
echo "------------- begin -----------"
cat $BACKUP_LABEL
echo "------------- end -----------"
echo "Read Start Wal as : $REF_FILE_NUM"
echo "REF_FILE_NUM=$REF_FILE_NUM"
# iterate list of files in the WAL_ARCHIVE folder
for i in `ls -1 $WAL_ARCHIVE` ;
do
# $i is :000000010000000A0000005D.bz2 eg
# get first 24 chars in filename
FILE_NUM=${i:0:24}
# compare if the number is less than the reference
# here string comparison is being used.
if [[ $FILE_NUM < $REF_FILE_NUM ]]
then
echo "$FILE_NUM [ $i ] removed"
rm -f $WAL_ARCHIVE/$i
else
echo "$FILE_NUM [ $i ] not removed"
fi
done
#------------------------- script ends -----------------------------------------------------------------