Обсуждение: Database size with large objects
I'm running postgresql 8.1.8 on RHEL4.
I have a database with a single table that includes an oid reference to a
large object. After loading 100GB of large objects using lo_import(), I
find that my total database size has grown by about 270GB. What is the
reason for the difference in space usage?
There are about 400,000 records in the table, with each large object the
same size (on average). I have autovacuum running, and had done a vacuum
--full --analyze prior to the load.
Max_fsm_pages is very high (1,500,000), and the vacuum --analyze confirms
that it is sufficient.
Is this just something that I have to live with?
Thanks,
Mike
"Michael Goldner" <MGoldner@agmednet.com> writes: > I have a database with a single table that includes an oid reference to a > large object. After loading 100GB of large objects using lo_import(), I > find that my total database size has grown by about 270GB. What is the > reason for the difference in space usage? Try VACUUM VERBOSE on your table and also on pg_largeobject to get a sense of where the space went. I'm wondering whether you allowed for indexes and for other data in your table. regards, tom lane
On 11/4/07 8:26 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > "Michael Goldner" <MGoldner@agmednet.com> writes: >> I have a database with a single table that includes an oid reference to a >> large object. After loading 100GB of large objects using lo_import(), I >> find that my total database size has grown by about 270GB. What is the >> reason for the difference in space usage? > > Try VACUUM VERBOSE on your table and also on pg_largeobject to get a > sense of where the space went. I'm wondering whether you allowed for > indexes and for other data in your table. > > regards, tom lane OK, I vacuumed both tables. The single table vacuumed in about 2 seconds and the stats look normal: INFO: vacuuming "public.image" INFO: index "image_pkey" now contains 386749 row versions in 1691 pages DETAIL: 21799 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.10u sec elapsed 0.14 sec. INFO: "image": removed 21799 row versions in 716 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "image": found 21799 removable, 386749 nonremovable row versions in 19589 pages DETAIL: 0 dead row versions cannot be removed yet. There were 183153 unused item pointers. 0 pages are entirely empty. CPU 0.09s/0.19u sec elapsed 0.32 sec. INFO: vacuuming "pg_toast.pg_toast_176283637" INFO: index "pg_toast_176283637_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_176283637": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM The pg_largeobject table, however, seems a bit odd: INFO: vacuuming "pg_catalog.pg_largeobject" INFO: index "pg_largeobject_loid_pn_index" now contains 105110204 row versions in 404151 pages DETAIL: 778599 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 21.24s/48.07u sec elapsed 273.15 sec. INFO: "pg_largeobject": removed 778599 row versions in 775264 pages DETAIL: CPU 54.73s/29.70u sec elapsed 2203.32 sec. INFO: "pg_largeobject": found 778599 removable, 105094846 nonremovable row versions in 34803136 pages DETAIL: 0 dead row versions cannot be removed yet. There were 70 unused item pointers. 0 pages are entirely empty. CPU 1031.40s/339.21u sec elapsed 10875.66 sec. VACUUM In particular, "105094846 nonremovable row versions in 34803136 pages" seems really high given that there are only about 400,000 large objects. The majority of the lobs are less than 1MB. However, a small subset can range as high as 1GB. I see that a single object is spread across multiple pages (loid, pageno) in the pg_largeobject table. Is it necessary or possible to adjust the page size for greater efficiency? Thanks, Mike
Michael Goldner <mgoldner@agmednet.com> writes: > The pg_largeobject table, however, seems a bit odd: > INFO: vacuuming "pg_catalog.pg_largeobject" > INFO: index "pg_largeobject_loid_pn_index" now contains 105110204 row > versions in 404151 pages > DETAIL: 778599 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 21.24s/48.07u sec elapsed 273.15 sec. > INFO: "pg_largeobject": removed 778599 row versions in 775264 pages > DETAIL: CPU 54.73s/29.70u sec elapsed 2203.32 sec. > INFO: "pg_largeobject": found 778599 removable, 105094846 nonremovable row > versions in 34803136 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 70 unused item pointers. > 0 pages are entirely empty. > CPU 1031.40s/339.21u sec elapsed 10875.66 sec. > VACUUM > In particular, "105094846 nonremovable row versions in 34803136 pages" seems > really high given that there are only about 400,000 large objects. Well, each row in pg_largeobject is a 2K (at most) chunk of a large object. There is something funny here because if there's only 100GB of LO data, that would average out to less than 1K per row, which is half what I'd expect. Do you have another source of large objects that are not-so-large-as-all-that and might be dragging down the average? It might be interesting to look at stats such as select sum(length(data)) from pg_largeobject; to confirm that your 100GB estimate for the data payload is accurate. regards, tom lane
On 11/5/07 12:19 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Michael Goldner <mgoldner@agmednet.com> writes: >> The pg_largeobject table, however, seems a bit odd: > >> INFO: vacuuming "pg_catalog.pg_largeobject" >> INFO: index "pg_largeobject_loid_pn_index" now contains 105110204 row >> versions in 404151 pages >> DETAIL: 778599 index row versions were removed. >> 0 index pages have been deleted, 0 are currently reusable. >> CPU 21.24s/48.07u sec elapsed 273.15 sec. >> INFO: "pg_largeobject": removed 778599 row versions in 775264 pages >> DETAIL: CPU 54.73s/29.70u sec elapsed 2203.32 sec. >> INFO: "pg_largeobject": found 778599 removable, 105094846 nonremovable row >> versions in 34803136 pages >> DETAIL: 0 dead row versions cannot be removed yet. >> There were 70 unused item pointers. >> 0 pages are entirely empty. >> CPU 1031.40s/339.21u sec elapsed 10875.66 sec. >> VACUUM > >> In particular, "105094846 nonremovable row versions in 34803136 pages" seems >> really high given that there are only about 400,000 large objects. > > Well, each row in pg_largeobject is a 2K (at most) chunk of a large > object. There is something funny here because if there's only 100GB > of LO data, that would average out to less than 1K per row, which is > half what I'd expect. Do you have another source of large objects > that are not-so-large-as-all-that and might be dragging down the > average? > > It might be interesting to look at stats such as > select sum(length(data)) from pg_largeobject; > to confirm that your 100GB estimate for the data payload is accurate. > > regards, tom lane That select returns the following: image=# select sum(length(data)) from pg_largeobject; sum -------------- 215040008847 (1 row) The actual on-disk data before the import was about half (this is a production environment so data has been added over the last 2 days): # du -sh /pglog/image_lo/ 102G /pglog/image_lo/ -- Mike Goldner Vice President Networks and Technology AG Mednet, Inc. The Pilot House Lewis Wharf Boston, MA 02110 617.854.3225 (office) 617.909.3009 (mobile)
Michael Goldner <mgoldner@agmednet.com> writes: > On 11/5/07 12:19 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> It might be interesting to look at stats such as >> select sum(length(data)) from pg_largeobject; >> to confirm that your 100GB estimate for the data payload is accurate. > That select returns the following: > image=# select sum(length(data)) from pg_largeobject; > sum > -------------- > 215040008847 > (1 row) Hmm, so given that you had 34803136 pages in pg_largeobject, that works out to just about 75% fill factor. That is to say, you're only getting 3 2K rows per page and not 4. If the rows were full-size then 4 would obviously not fit (there is some overhead...) but the normal expectation in pg_largeobject is that tuple compression will shave enough space to make up for the overhead and let you get 4 rows per page. Are your large objects mostly pre-compressed data? regards, tom lane