Обсуждение: Dump size bigger than pgdata size?
[sorry if this was previously asked: list searches seem to be down]
I'm using pg_dump to take a full backup of my database using a compressed format:
$ pg_dump -Fc my_db > /backup/my_db.dmp
It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
$ ls -l /backup
-rw-r--r-- 6592715242 my_db.dmp
$ du -b /data
5372269196 /data
How could it be?
As far as I know, dumps should be smaller than filesystem datafile since they do not store indexes, etc.
Database contains about one-hundred-thousands binary images, some of which may be already compressed. So i tried the --compress=0 option but this produces a dump that does not fit on my disk (more than 11 GB).
I'm using postgres 8.1.2 on RHEL4.
So, what can I do to diagnose the problem?
Thanks in advance,
Nicola
You may try dumping the tables individually with
- --table= table
Personally I don't use the built in compression in pg_dump but pipe it to gzip instead (not sure if it makes any difference). See http://manual.intl.indoglobal.com/ch06s07.html for details.
-Aaron
[sorry if this was previously asked: list searches seem to be down]
I'm using pg_dump to take a full backup of my database using a compressed format:
$ pg_dump -Fc my_db > /backup/my_db.dmp
It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
$ ls -l /backup
-rw-r--r-- 6592715242 my_db.dmp
$ du -b /data
5372269196 /data
How could it be?
As far as I know, dumps should be smaller than filesystem datafile since they do not store indexes, etc.
Database contains about one-hundred-thousands binary images, some of which may be already compressed. So i tried the --compress=0 option but this produces a dump that does not fit on my disk (more than 11 GB).
I'm using postgres 8.1.2 on RHEL4.
So, what can I do to diagnose the problem?
Thanks in advance,
Nicola
==================================================================
Aaron Bono
President Aranya Software Technologies, Inc.
http://www.aranya.com We take care of your technology needs.
Phone: (816) 695-6071
==================================================================
It might happen because of the type of data you have ( binary images). The compression for binary files is notorious horrible since there is a small chance of occurrence of same chars
In other words it is possible since during compression there are additional chars added for checksums and redundancy
This would normally happen on small binary files, though.
-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Nicola Mauri
Sent: Wednesday, June 21, 2006 10:30 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Dump size bigger than pgdata size?
[sorry if this was previously asked: list searches seem to be down]
I'm using pg_dump to take a full backup of my database using a compressed format:
$ pg_dump -Fc my_db > /backup/my_db.dmp
It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
$ ls -l /backup
-rw-r--r-- 6592715242 my_db.dmp
$ du -b /data
5372269196 /data
How could it be?
As far as I know, dumps should be smaller than filesystem datafile since they do not store indexes, etc.
Database contains about one-hundred-thousands binary images, some of which may be already compressed. So i tried the --compress=0 option but this produces a dump that does not fit on my disk (more than 11 GB).
I'm using postgres 8.1.2 on RHEL4.
So, what can I do to diagnose the problem?
Thanks in advance,
Nicola
"Nicola Mauri" <Nicola.Mauri@saga.it> writes: > I'm using pg_dump to take a full backup of my database using a compressed > format: > $ pg_dump -Fc my_db > /backup/my_db.dmp > It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space: > ... > Database contains about one-hundred-thousands binary images, some of which > may be already compressed. Are those stored as bytea fields, or large objects, or what? I can easily imagine bytea inflating to be larger in a dump than it is on disk, since the text format for non-ASCII byte values looks like "\\nnn" ie 5 bytes for only one byte on disk. Assuming that ASCII bytes make up about half of the data, the average expansion factor would be about 2.5x. Compression of the dump text would buy back some of this bloat but probably not all. This could be avoided by using COPY BINARY format, but I don't see any very nice way to do that in the context of pg_dump --- it needs to intermix COPY data with SQL commands ... regards, tom lane
On Jun 21, 2006, at 12:00 PM, Tom Lane wrote: > This could be avoided by using COPY BINARY format, but I don't see any > very nice way to do that in the context of pg_dump --- it needs to > intermix COPY data with SQL commands ... Would the tar or custom format allow for this? IIRC, at least tar puts all the copied data into separate files... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim Nasby <jnasby@pervasive.com> writes: > On Jun 21, 2006, at 12:00 PM, Tom Lane wrote: >> This could be avoided by using COPY BINARY format, but I don't see any >> very nice way to do that in the context of pg_dump --- it needs to >> intermix COPY data with SQL commands ... > Would the tar or custom format allow for this? IIRC, at least tar > puts all the copied data into separate files... Well, you could sorta do that, but the case that would stop working is pg_restore output to a plain text SQL script (and related issues such as the ability to use the feature in the context of pg_dumpall). Having just gotten done fixing similar inconsistencies in pg_dump/pg_restore for BLOBs, I'm not eager to re-introduce 'em for COPY BINARY ... regards, tom lane
On Jun 22, 2006, at 9:39 PM, Tom Lane wrote: > Jim Nasby <jnasby@pervasive.com> writes: >> On Jun 21, 2006, at 12:00 PM, Tom Lane wrote: >>> This could be avoided by using COPY BINARY format, but I don't >>> see any >>> very nice way to do that in the context of pg_dump --- it needs to >>> intermix COPY data with SQL commands ... > >> Would the tar or custom format allow for this? IIRC, at least tar >> puts all the copied data into separate files... > > Well, you could sorta do that, but the case that would stop working is > pg_restore output to a plain text SQL script (and related issues > such as > the ability to use the feature in the context of pg_dumpall). Having > just gotten done fixing similar inconsistencies in pg_dump/pg_restore > for BLOBs, I'm not eager to re-introduce 'em for COPY BINARY ... Yeah, but how many people actually do that anyway? I can't really come up with a use-case for it, and I'm pretty sure there's other gains to be had by turning custom or tar format into more of a 'binary dump'. For one thing, that should ease the need to run the newer version of pg_dump when upgrading (if we put the requisite brains into pg_restore). I suppose we could put support in pg_restore to convert between BINARY and escaped as needed; or just disallow pg_restore from dumping SQL if there's binary data (maybe have it include copy statements that reference the specific files). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461