Обсуждение: Does pg_dump have a 2GB file-size limit?
I develop a piece of software that uses PostgreSQL (8.3.5) as a back-end database. The software can, optionally, use pg_dump to create snapshots of the database. One user has run into a problem where pg_dump dumps 2GB, then claims that the archive is too large. I haven't yet found documentation for pg_dump's internal limits, and was wondering if anyone could shed light on this. I'm not an expert on Linux filesystems, so perhaps I'm just missing something obvious, but I was under the impression that default ext3, with a 4K block size, supported files up to 2TB in size.
Command-line used to invoke pg_dump:
pg_dump -v -F c -x -O -f "path/to/file" -h db_host -p db_port -U user_name database_name
A temporary file is created in a secure location, and used with PGPASSFILE to avoid the password prompt. Here is the error found in the pg_dump log:
pg_dump: dumping contents of table XYZ (edited name)
pg_dump: [custom archiver] could not write to output file: File too large
pg_dump: *** aborted because of error
The left-over partially-created file is a suspicious 2147483647 bytes in size. The DB is running on 32-bit RHEL 5.3 (Linux-2.6.18-128.1.16). I don't know the libc version at the moment, but it would be whatever was shipped with the distro. Here is the output of ulimit on that system:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 77823
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 77823
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 77823
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 77823
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
The pg_dump binary was compiled on a different system, 32-bit RHEL4 (Linux-2.6.9-78.0.1) against glibc 2.3.4.
I don't have exact numbers at the moment, but the database is approximately 150 GB in size. Roughly 2/3rds of that is occupied by one table with approximately 2 billion rows, which was dumped successfully according to the pg_dump log. Most of the remainder is occupied by table XYZ, with approximately 200 million rows.
Since the software is cross-platform, I'd prefer to avoid work-arounds such as "pipe it through split" unless tracking down the problem itself fails. Also, since the machine running the database belongs to a user, I'm limited in what I can "try out" on it. If changes are needed for a solution, that's fine, but changes purely for diagnosis, i.e. "try X and see if that works", are very difficult. I'm aware that the kernel/libc that we build against is rather old, but it shouldn't be so old as to be a problem. Although I have more control over that machine, updating it unnecessarily is undesirable, and I can only do it if it would contribute to a solution. Finally, migrating to PG 8.4 is not an option in the short-term, unless it would definitely fix the problem.
I know these restrictions may make it much harder to diagnose the issue, so I apologize in advance. Thank you for any help you can provide,
David
David Schnur <dnschnur@gmail.com> writes: > I develop a piece of software that uses PostgreSQL (8.3.5) as a back-end > database. The software can, optionally, use pg_dump to create snapshots of > the database. One user has run into a problem where pg_dump dumps 2GB, then > claims that the archive is too large. Hmm, shouldn't be happening on a reasonably modern Linux system. You might want to check the pg_config.h that was generated while building pg_dump, to ensure that _LARGEFILE_SOURCE is defined and SIZEOF_OFF_T is 8. Another possibility is that pg_dump is fine but you're trying to write on a filesystem without largefile support. Yes, ext3 is okay, but are you sure the customer is dumping onto ext3? regards, tom lane
Thank you for the fast response! Your question prompted me to check our configure options (something I should have done originally). For reasons unknown to me, we have been building with --disable-largefile on some systems, including RHEL4. That obviously goes a long way towards explaining the behavior. Sorry for wasting your time; I imagine this thread will soon be added to some kind of highlight reel.
David