Обсуждение: PostgreSQL Dump rate is too slow
Hi all,
We have a customer who has 1TB database on a production server. They are trying dump based backup of this large database. Following dump command is being used.
Dump rate is around 12 GB/hr, which will take lot of time for the backup to complete. This is affecting their production server.
Is there is a way to increase dump data rate ?
pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>'
PostgreSQL version : 9.2.4
Platform : Linux
Thanks
Girish
girish R G peetle wrote > Hi all, > We have a customer who has 1TB database on a production server. They are > trying dump based backup of this large database. Following dump command is > being used. > Dump rate is around 12 GB/hr, which will take lot of time for the backup > to > complete. This is affecting their production server. > Is there is a way to increase dump data rate ? > > pg_dump -U > <User> > -Fc -b --port= > <Port> > ' > <Db-Name> > ' > > > PostgreSQL version : 9.2.4 > Platform : Linux > > Thanks > Girish Use a 9.3+ version of pg_dump with the --jobs option and directory output. David J. -- View this message in context: http://postgresql.nabble.com/PostgreSQL-Dump-rate-is-too-slow-tp5833279p5833280.html Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
> On Jan 7, 2015, at 21:35, girish R G peetle <giri.anamika0@gmail.com> wrote: > > Hi all, > We have a customer who has 1TB database on a production server. They are trying dump based backup of this large database.Following dump command is being used. > Dump rate is around 12 GB/hr, which will take lot of time for the backup to complete. This is affecting their productionserver. > Is there is a way to increase dump data rate ? > > pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>' > You could reduce the compression ratio. Gzip defaults to -Z6, but -Z3 is quite a bit quicker with not that much larger dumpfile. I did a few tests here: http://frosty-postgres.blogspot.com/2011/12/pgdump-compression-settings.html Also, if it's is impacting production, you could set up a streaming replica and take the dump from the replica.
Thanks David and Jeff.
@David: In our case we don't want to redirect the dump data to local disk. We will redirect the dump it to a FIFO file ( pipe ). From FIFO file we read the dump data and push to backup Media ( Tape, SAN etc. ). So we can't use directory output.
@Jeff: Sure, I will reduce compression ratio and update the thread with outcome. Also I'll recommend to setup standby with streaming replication.
On Thu, Jan 8, 2015 at 11:20 AM, Jeff Frost <jeff@pgexperts.com> wrote:
> On Jan 7, 2015, at 21:35, girish R G peetle <giri.anamika0@gmail.com> wrote:
>
> Hi all,
> We have a customer who has 1TB database on a production server. They are trying dump based backup of this large database. Following dump command is being used.
> Dump rate is around 12 GB/hr, which will take lot of time for the backup to complete. This is affecting their production server.
> Is there is a way to increase dump data rate ?
>
> pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>'
>
You could reduce the compression ratio. Gzip defaults to -Z6, but -Z3 is quite a bit quicker with not that much larger dump file.
I did a few tests here: http://frosty-postgres.blogspot.com/2011/12/pgdump-compression-settings.html
Also, if it's is impacting production, you could set up a streaming replica and take the dump from the replica.
On Thu, Jan 8, 2015 at 11:05 AM, girish R G peetle <giri.anamika0@gmail.com> wrote:
Hi all,We have a customer who has 1TB database on a production server. They are trying dump based backup of this large database. Following dump command is being used.Dump rate is around 12 GB/hr, which will take lot of time for the backup to complete. This is affecting their production server.Is there is a way to increase dump data rate ?pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>'PostgreSQL version : 9.2.4Platform : LinuxThanks
Hi Girish,
As the database size is too large and tweaking few database parameters will result towards performance improvement on pg_dump to some extent and pg_dump -j option would have helped if the DB version is PostgreSQL 9.3 but unfortunately your DB version is PostgreSQL 9.2
Another option I would think of to speed up by performing pg_dump in parallel for Database Or schema level however it would increase the load on the server.
The streaming replication option is the best solution for such higher database which will result towards to reduce the higher downtime/data loss and recovery time also helpful to perform physical and logical backups from replica instead of primary to avoid the impact on the primary servers..
Hope this helps.
To elaborate, I have found the rsync used in my (most?) streaming replication setups is 10x or more faster than pg_dump. If your job specs required a pg_dump output file it would be fairly simple to do the following (moree or less):
1. setup a replicant (even on the same machine but different drive if you wanted)
2. make sure the replicant is off
3. rsync the data dir to the replicant
4. turn on the replicant
5. execute pg_dump on the replicant
regards,
/r/b
On Thu, Jan 08, 2015 at 11:05:54AM +0530, girish R G peetle wrote: > Hi all, > We have a customer who has 1TB database on a production server. They are > trying dump based backup of this large database. Following dump command is > being used. > Dump rate is around 12 GB/hr, which will take lot of time for the backup to > complete. This is affecting their production server. > Is there is a way to increase dump data rate ? > > pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>' > > > PostgreSQL version : 9.2.4 > Platform : Linux > > Thanks > Girish Hi Girish, The custom format is compressed by default and your dump rate is hamstrung by the slow compression algorithm being used. I our case for a smaller DB, turning off compression completely (-Z 0) dropped our time to dump the DB from 6 hours to about 20 minutes. You can push the uncompressed dump through a faster compression engine using lzo, snappy or lz4 if you need to have compressed output due to space limitations. Regards, Ken
On Jan 8, 2015, at 4:14 AM, Robert Burgholzer <rburghol@vt.edu> wrote: > > To elaborate, I have found the rsync used in my (most?) streaming replication setups is 10x or more faster than pg_dump. If your job specs required a pg_dump output file it would be fairly simple to do the following (moree or less): > 1. setup a replicant (even on the same machine but different drive if you wanted) > 2. make sure the replicant is off > 3. rsync the data dir to the replicant > 4. turn on the replicant > 5. execute pg_dump on the replicant > > regards, > /r/b Don't forget to rsync between pg_start_backup & pg_stop_backup. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
Robert Burgholzer <rburghol@vt.edu> wrote: > To elaborate, I have found the rsync used in my (most?) streaming > replication setups is 10x or more faster than pg_dump. If your > job specs required a pg_dump output file it would be fairly simple > to do the following (moree or less): > 1. setup a replicant (even on the same machine but different > drive if you wanted) > 2. make sure the replicant is off > 3. rsync the data dir to the replicant > 4. turn on the replicant > 5. execute pg_dump on the replicant If the original cluster is running, unless the steps for a PITR backup are followed, that rsync could generated a corrupted database. The corruption may not be immediately apparent, and might not happen every time, but the above is not safe without a few more steps. http://www.postgresql.org/docs/9.2/interactive/continuous-archiving.html This is not intended as a complete list, but among other things archiving should be working first, the rsync should exclude the postmaster.pid file and the files under the pg_xlog directory, rsync should be preceded by pg_start_backup() and followed by pg_stop_backup(), and a recovery.conf file is needed. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 8, 2015 at 12:12 AM, Adalkonda Harshad [via PostgreSQL] <[hidden email]> wrote:
As the database size is too large and tweaking few database parameters will result towards performance improvement on pg_dump to some extent and pg_dump -j option would have helped if the DB version is PostgreSQL 9.3 but unfortunately your DB version is PostgreSQL 9.2
pg_dump 9.3+ can be used against the 9.2 server without problems. Read the documentation for the --jobs option for details.
David J.
View this message in context: Re: PostgreSQL Dump rate is too slow
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
On Thu, Jan 08, 2015 at 08:38:37AM -0700, David G Johnston wrote: > On Thu, Jan 8, 2015 at 12:12 AM, Adalkonda Harshad [via PostgreSQL] < > ml-node+s1045698n5833283h97@n5.nabble.com> wrote: > > > > > As the database size is too large and tweaking few database parameters > > will result towards performance improvement on pg_dump to some extent and > > pg_dump -j option would have helped if the DB version is PostgreSQL 9.3 but > > unfortunately your DB version is PostgreSQL 9.2 > > > > > pg_dump 9.3+ can be used against the 9.2 server without problems. Read > the documentation for the --jobs option for details. > > David J. > > +1 This is a nice option for allowing you to leverage more than a single CPU core for the dump, provided you have the CPU and I/O cycles to support it. As others have suggested, offloading to a replica can help with that. Regards, Ken
ktm@rice.edu wrote: > On Thu, Jan 08, 2015 at 08:38:37AM -0700, David G Johnston wrote: > > On Thu, Jan 8, 2015 at 12:12 AM, Adalkonda Harshad [via PostgreSQL] < > > ml-node+s1045698n5833283h97@n5.nabble.com> wrote: > > > > > > > > As the database size is too large and tweaking few database parameters > > > will result towards performance improvement on pg_dump to some extent and > > > pg_dump -j option would have helped if the DB version is PostgreSQL 9.3 but > > > unfortunately your DB version is PostgreSQL 9.2 > > > > > pg_dump 9.3+ can be used against the 9.2 server without problems. Read > > the documentation for the --jobs option for details. > > This is a nice option for allowing you to leverage more than a single CPU core > for the dump, provided you have the CPU and I/O cycles to support it. As others > have suggested, offloading to a replica can help with that. Keep in mind, though, that while 9.3's pg_dump can read from a 9.2 server, there is no guarantee that what it writes will be readable by 9.2 other tools (pg_restore or psql). Maybe it is currently, but there are no promises. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 8 Jan 2015 11:05:54 +0530 girish R G peetle <giri.anamika0@gmail.com> wrote: > Hi all, > We have a customer who has 1TB database on a production server. They > are trying dump based backup of this large database. Following dump > command is being used. > Dump rate is around 12 GB/hr, which will take lot of time for the > backup to complete. This is affecting their production server. > Is there is a way to increase dump data rate ? > > pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>' > Do not use pg_dump compression, pipe output to xz % pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>' | xz -3 dump.xz or pipe xz output to other program. When I looked for the same problem in 8.3-8.4 versions, the bottleneck was in accessing TOAST tables, it's content was decompressed,dumped and recompressed again, don't know if it has changed in current versions. > > PostgreSQL version : 9.2.4 > Platform : Linux > > Thanks > Girish --- --- Eduardo Morras <emorrasg@yahoo.es>
On 01/14/2015 12:20 PM, Eduardo Morras wrote: > > On Thu, 8 Jan 2015 11:05:54 +0530 > girish R G peetle <giri.anamika0@gmail.com> wrote: > >> Hi all, >> We have a customer who has 1TB database on a production server. They >> are trying dump based backup of this large database. Following dump >> command is being used. >> Dump rate is around 12 GB/hr, which will take lot of time for the >> backup to complete. This is affecting their production server. >> Is there is a way to increase dump data rate ? >> >> pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>' >> > > Do not use pg_dump compression, pipe output to xz > > % pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>' | xz -3 dump.xz > > or pipe xz output to other program. > > When I looked for the same problem in 8.3-8.4 versions, the bottleneck was in accessing TOAST tables, it's content wasdecompressed, dumped and recompressed again, don't know if it has changed in current versions. Don't do this. You are still looking at an extremely slow dump. Instead set up a warm or hot standby or use pg_basebackup. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc "If we send our children to Caesar for their education, we should not be surprised when they come back as Romans."
Hi all,
We tried pg_dump with compression level set to zero on 1TB database. Dump data rate started with 250GB/hr and gradually dropped to 30 GB/hr with 2 hours time span. We might see this behavior on standby server too, which will be undesirable.Dump command : pg_dump -Fc -Z 0
Thanks
Girish
Girish
On Thu, Jan 15, 2015 at 2:32 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On 01/14/2015 12:20 PM, Eduardo Morras wrote:
On Thu, 8 Jan 2015 11:05:54 +0530
girish R G peetle <giri.anamika0@gmail.com> wrote:Hi all,
We have a customer who has 1TB database on a production server. They
are trying dump based backup of this large database. Following dump
command is being used.
Dump rate is around 12 GB/hr, which will take lot of time for the
backup to complete. This is affecting their production server.
Is there is a way to increase dump data rate ?
pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>'
Do not use pg_dump compression, pipe output to xz
% pg_dump -U <User> -Fc -b --port=<Port> '<Db-Name>' | xz -3 dump.xz
or pipe xz output to other program.
When I looked for the same problem in 8.3-8.4 versions, the bottleneck was in accessing TOAST tables, it's content was decompressed, dumped and recompressed again, don't know if it has changed in current versions.
Don't do this. You are still looking at an extremely slow dump. Instead set up a warm or hot standby or use pg_basebackup.
JD
--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
"If we send our children to Caesar for their education, we should
not be surprised when they come back as Romans."
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
On 01/15/2015 09:21 AM, girish R G peetle wrote: > Hi all, > We tried pg_dump with compression level set to zero on 1TB database. > Dump data rate started with 250GB/hr and gradually dropped to 30 GB/hr > with 2 hours time span. We might see this behavior on standby server > too, which will be undesirable. > > Any explanation on why we see this behavior ? Because you have a long running transaction that is causing bloat to pile up. Using pg_dump on a production database that size is a non-starter. You need a warm/hot standby or snapshot to do this properly. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc "If we send our children to Caesar for their education, we should not be surprised when they come back as Romans."
Thanks Joshua. Even if we have a long transaction running on the database, pg_dump shouldn't be affected right ? As it doesn't block readers or writers.
Before getting resources to setup stand by server, I just wanna make sure that we don't this issue on stand by too.On Thu, Jan 15, 2015 at 11:11 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
On 01/15/2015 09:21 AM, girish R G peetle wrote:Hi all,
We tried pg_dump with compression level set to zero on 1TB database.
Dump data rate started with 250GB/hr and gradually dropped to 30 GB/hr
with 2 hours time span. We might see this behavior on standby server
too, which will be undesirable.
Any explanation on why we see this behavior ?
Because you have a long running transaction that is causing bloat to pile up. Using pg_dump on a production database that size is a non-starter. You need a warm/hot standby or snapshot to do this properly.
JD
--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
"If we send our children to Caesar for their education, we should
not be surprised when they come back as Romans."
On Jan 15, 2015, at 10:57 AM, girish R G peetle <giri.anamika0@gmail.com> wrote: > > Thanks Joshua. Even if we have a long transaction running on the database, pg_dump shouldn't be affected right ? As itdoesn't block readers or writers. > Before getting resources to setup stand by server, I just wanna make sure that we don't this issue on stand by too. I believe he meant that pg_dump itself is the long-running transaction. Even though read-only transactions block neitherreaders nor writers, they still prevent cleanup of blocks because they have to be able to see the state of the dbas of the beginning of the transaction. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
Thanks Scott, Joshua. Got it.
On Thu, Jan 15, 2015 at 11:43 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On Jan 15, 2015, at 10:57 AM, girish R G peetle <giri.anamika0@gmail.com> wrote:
>
> Thanks Joshua. Even if we have a long transaction running on the database, pg_dump shouldn't be affected right ? As it doesn't block readers or writers.
> Before getting resources to setup stand by server, I just wanna make sure that we don't this issue on stand by too.
I believe he meant that pg_dump itself is the long-running transaction. Even though read-only transactions block neither readers nor writers, they still prevent cleanup of blocks because they have to be able to see the state of the db as of the beginning of the transaction.
On 01/15/2015 09:57 AM, girish R G peetle wrote: > Thanks Joshua. Even if we have a long transaction running on the > database, pg_dump shouldn't be affected right ? As it doesn't block > readers or writers. Correct. It does not block readers or writers, it does however block routine maintenance most of the time, which will cause bloat to pile up, making your reads (and writes) go slower as the pages become more fragmented. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc "If we send our children to Caesar for their education, we should not be surprised when they come back as Romans."