Обсуждение: WAL rotation question version 8.3.0
I am setting up a postgresql server (duh) and am using archive_mode=on The archive command that I am using sends data to an enterprise backup server across the network, and I must be able to handle outages of that server without taking down the postgresql server. Short outages are fine because the archive_command will return a non zero result to postgresql and it will be retried every minute until successful. If the backup server is out for a longer time, new WAL files will be created by postgresql. This will eventually fill the pg_xlog filesystem and bad things happen :-( To protect the production database functionality, when the pg_xlog filesystem reaches some percentage full (we chose 90%) then the archive_command starts reporting a success (return of zero) even though it is not able to archive the xlog files. I understand that this prevents me from doing a disaster recovery AND prevents me from doing a point in time restore, but in our opinion it is better than letting the database crash. Now to the question. Once the archive_command starts lying about its success, postgresql deletes a number of the xlog files that it has been told have been successfuly archived. Why does it do this? Can I control it? Can I turn it off? -- Evan Rempel, Senior Systems Administrator University of Victoria
Evan Rempel wrote: > Now to the question. > > Once the archive_command starts lying about its success, postgresql deletes > a number of the xlog files that it has been told have been successfuly archived. > Why does it do this? Can I control it? Can I turn it off? Because they're no longer needed. If you want to keep those files, make the archive_command not lie. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Evan Rempel wrote: > >> Now to the question. >> >> Once the archive_command starts lying about its success, postgresql deletes >> a number of the xlog files that it has been told have been successfuly archived. >> Why does it do this? Can I control it? Can I turn it off? > > Because they're no longer needed. > > If you want to keep those files, make the archive_command not lie. Normally posgres will rename the old WAL files that have been archived and are no longer needed, keeping the number of WAL files constant. In this case, it actually deletes them. Why is the behaviour different? -- Evan Rempel
Evan Rempel wrote: > Alvaro Herrera wrote: >> Evan Rempel wrote: >> >>> Now to the question. >>> >>> Once the archive_command starts lying about its success, postgresql deletes >>> a number of the xlog files that it has been told have been successfuly archived. >>> Why does it do this? Can I control it? Can I turn it off? >> >> Because they're no longer needed. >> >> If you want to keep those files, make the archive_command not lie. > > > Normally posgres will rename the old WAL files that have been archived and are no longer needed, > keeping the number of WAL files constant. In this case, it actually deletes them. > Why is the behaviour different? Renaming files is done because the files will be reused in the future under the new name. However, after a long archiver failure, new files need to be created to hold the extra data. When the archiver is restored, those excess files can be deleted because they're not needed for recycling. (The number of files to keep for recycling is a function of checkpoint_segments.) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
>>>> Now to the question. >>>> >>>> Once the archive_command starts lying about its success, postgresql deletes >>>> a number of the xlog files that it has been told have been successfuly archived. >>>> Why does it do this? Can I control it? Can I turn it off? >>> Because they're no longer needed. >>> >>> If you want to keep those files, make the archive_command not lie. >> >> Normally posgres will rename the old WAL files that have been archived and are no longer needed, >> keeping the number of WAL files constant. In this case, it actually deletes them. >> Why is the behaviour different? > > Renaming files is done because the files will be reused in the future > under the new name. However, after a long archiver failure, new files > need to be created to hold the extra data. When the archiver is > restored, those excess files can be deleted because they're not needed > for recycling. (The number of files to keep for recycling is a function > of checkpoint_segments.) So it looks like postgresql will try to keep 2.5 * checkpoint_segments files, and if it has more that have been reported as archived, then it will start removing them. Does this sound correct? -- Evan Rempel