Обсуждение: deb package sizes
Hello, I hope I found a good mailing list for this topic? Recently, I've been spending some time looking at the official Postgres docker images. https://hub.docker.com/_/postgres/ I think there are a lot of people using these to quickly spin up a postgres database for testing on their local dev machine. Right now, they are also the base image used for building CloudNativePG production postgres images. These official docker images are a repackaging of the PGDG debian packages, combined with a minimal set of debian OS packages. Docker images are built using both debian stable and debian oldstable branches (with tags like "17.1-bookworm" and "17.1-bullseye"). With docker images, we like to get the container images to be as minimal and small as possible. I have spent a little time looking at the make-up of the official docker images from a size perspective, which is driven by debian package sizes. Before adding any PGDG postgres packages or dependencies, our base OS container image is 74MB and includes about 88 debian packages. We install only 5 PGDG postgres packages: postgresql, postgresql-client, postgresql-client-common, postgresql-common and libpq5. The "common" packages are tiny, libpq is 1MB, client is 10MB and the postgresql package itself is 60MB. What's more interesting is all of the additional dependencies that the postgresql package pulls in: an extra 53 debian packages that are over 250MB in total size. The biggest size contributors are libllvm & libz3 (143MB), libperl & perl-modules (45MB total) and libicu (36MB). These three things alone make up 64% of the total postgres-specific bytes. I'm wondering if there might be any support for providing a "postgresql-slim" package on PGDG which excludes llvm and python? I think this might almost cut the total install size in half, and I think there might be many users who would value having the option. Even though ICU is a larger package, I would argue for still including it in a "slim" build. Because of the drama around glibc collation I view ICU as especially important to make available. Interested to know others' thoughts about having a slimmer package. Thanks, Jeremy Schneider PS. here are the commands I used to get the sizes (apologies that the formatting isn't great) and the full list of postgresql-specific packages docker run --rm debian:bookworm-slim dpkg-query --show --showformat='${Package}\t${Installed-Size} KB\n' > base-pkgs docker run --rm postgres:17-bookworm dpkg-query --show --showformat='${Package}\t${Installed-Size} KB\n' > pg-pkgs docker run --rm postgres:17-bookworm apt rdepends libz3-4 libz3-4 Reverse Depends: Depends: libllvm16 (>= 4.8.12) diff -b base-pkgs pg-pkgs |grep '^>'|sort -k3 -n | awk '{total+=$3;printf "%-30s %s",$0, "| running total size: "total " KB | running total percentage: "total*100/355572"%\n"}' netbase 36 KB | running total size: 36 KB | running total percentage: 0.0101245% libkeyutils1 40 KB | running total size: 76 KB | running total percentage: 0.021374% libnpth0 50 KB | running total size: 126 KB | running total percentage: 0.0354359% sensible-utils 56 KB | running total size: 182 KB | running total percentage: 0.0511851% ssl-cert 64 KB | running total size: 246 KB | running total percentage: 0.0691843% libgdbm-compat4 70 KB | running total size: 316 KB | running total percentage: 0.0888709% libsasl2-modules-db 77 KB | running total size: 393 KB | running total percentage: 0.110526% readline-common 89 KB | running total size: 482 KB | running total percentage: 0.135556% libnss-wrapper 99 KB | running total size: 581 KB | running total percentage: 0.163399% libio-pty-perl 103 KB | running total size: 684 KB | running total percentage: 0.192366% libassuan0 117 KB | running total size: 801 KB | running total percentage: 0.225271% libgdbm6 129 KB | running total size: 930 KB | running total percentage: 0.26155% libkrb5support0 133 KB | running total size: 1063 KB | running total percentage: 0.298955% postgresql-client-common 133 KB | running total size: 1196 KB | running total percentage: 0.336359% pinentry-curses 140 KB | running total size: 1336 KB | running total percentage: 0.375733% libsasl2-2 167 KB | running total size: 1503 KB | running total percentage: 0.422699% libbsd0 202 KB | running total size: 1705 KB | running total percentage: 0.479509% ucf 214 KB | running total size: 1919 KB | running total percentage: 0.539694% libjson-perl 244 KB | running total size: 2163 KB | running total percentage: 0.608316% libedit2 258 KB | running total size: 2421 KB | running total percentage: 0.680875% libk5crypto3 260 KB | running total size: 2681 KB | running total percentage: 0.753996% libipc-run-perl 267 KB | running total size: 2948 KB | running total percentage: 0.829087% less 313 KB | running total size: 3261 KB | running total percentage: 0.917114% libksba8 316 KB | running total size: 3577 KB | running total percentage: 1.00598% libncursesw6 412 KB | running total size: 3989 KB | running total percentage: 1.12185% libgssapi-krb5-2 424 KB | running total size: 4413 KB | running total percentage: 1.2411% libreadline8 475 KB | running total size: 4888 KB | running total percentage: 1.37469% libxslt1.1 504 KB | running total size: 5392 KB | running total percentage: 1.51643% libldap-2.5-0 553 KB | running total size: 5945 KB | running total percentage: 1.67195% gpg-wks-server 657 KB | running total size: 6602 KB | running total percentage: 1.85673% postgresql-common 667 KB | running total size: 7269 KB | running total percentage: 2.04431% perl 669 KB | running total size: 7938 KB | running total percentage: 2.23246% gpg-wks-client 682 KB | running total size: 8620 KB | running total percentage: 2.42426% gpgconf 803 KB | running total size: 9423 KB | running total percentage: 2.6501% gnupg 885 KB | running total size: 10308 KB | running total percentage: 2.89899% gpgsm 992 KB | running total size: 11300 KB | running total percentage: 3.17798% libpq5 1068 KB | running total size: 12368 KB | running total percentage: 3.47834% libkrb5-3 1076 KB | running total size: 13444 KB | running total percentage: 3.78095% xz-utils 1226 KB | running total size: 14670 KB | running total percentage: 4.12575% dirmngr 1328 KB | running total size: 15998 KB | running total percentage: 4.49923% gpg-agent 1348 KB | running total size: 17346 KB | running total percentage: 4.87834% gpg 1581 KB | running total size: 18927 KB | running total percentage: 5.32297% libsqlite3-0 1682 KB | running total size: 20609 KB | running total percentage: 5.79601% gnupg-utils 1836 KB | running total size: 22445 KB | running total percentage: 6.31236% libxml2 1866 KB | running total size: 24311 KB | running total percentage: 6.83715% zstd 2102 KB | running total size: 26413 KB | running total percentage: 7.42831% openssl 2296 KB | running total size: 28709 KB | running total percentage: 8.07403% libc-l10n 4348 KB | running total size: 33057 KB | running total percentage: 9.29685% gnupg-l10n 4874 KB | running total size: 37931 KB | running total percentage: 10.6676% libssl3 6021 KB | running total size: 43952 KB | running total percentage: 12.3609% postgresql-client-17 9947 KB | running total size: 53899 KB | running total percentage: 15.1584% locales 15845 KB | running total size: 69744 KB | running total percentage: 19.6146% perl-modules-5.36 17816 KB | running total size: 87560 KB | running total percentage: 24.6251% libz3-4 22767 KB | running total size: 110327 KB | running total percentage: 31.028% libperl5.36 28862 KB | running total size: 139189 KB | running total percentage: 39.1451% libicu72 36170 KB | running total size: 175359 KB | running total percentage: 49.3174% postgresql-17 59671 KB | running total size: 235030 KB | running total percentage: 66.0991% libllvm16 120542 KB | running total size: 355572 KB | running total percentage: 100%
Re: Jeremy Schneider > I'm wondering if there might be any support for providing a > "postgresql-slim" package on PGDG which excludes llvm and python? I > think this might almost cut the total install size in half, and I think > there might be many users who would value having the option. Hi, could you explain why 250 MB is too much? Disk space these days is ultra cheap and removing functionality (query JITing) does have cost as well. > Even though ICU is a larger package, I would argue for still > including it in a "slim" build. Because of the drama around glibc > collation I view ICU as especially important to make available. Note that ICU does not fix the collation drama either, you will have to reindex on ICU upgrades as well. Christoph
On 9/1/25 10:07, Christoph Berg wrote: > Re: Jeremy Schneider >> I'm wondering if there might be any support for providing a >> "postgresql-slim" package on PGDG which excludes llvm and python? I >> think this might almost cut the total install size in half, and I think >> there might be many users who would value having the option. > Hi, > > could you explain why 250 MB is too much? Disk space these days is > ultra cheap Hi Christoph. Container images allow (are meant to) contain only the necessary files needed to run the process that will be run when the image is run. As such, any additional file poses two main problems: * Disk space is cheap. Bandwidth not so much. Time to start a container may have a notable cost. Making container images slimmer helps in all these dimensions. When you run the same container image in many places, with high frequency, and end up pulling it multiple times, it all that has a cost. In particular for Postgres, time pulling and running an image may affect uptime. So it can become quite important. * Security analysis. Unneeded files (specially binaries, but not only) may lead to container images having (more) security vulnerabilities than they could. For many, container images must pass vulnerability analysis scans, and the more (unneeded) packages present, the bigger the chances are that they may contain vulnerabilities. It's anyway a basic security principle, to only contain the files needed to run the files needed, and no more. > and removing functionality (query JITing) does have cost > as well. If it can be made optional, then users can decide whether they want container images with this functionality or not. >> Even though ICU is a larger package, I would argue for still >> including it in a "slim" build. Because of the drama around glibc >> collation I view ICU as especially important to make available. > Note that ICU does not fix the collation drama either, you will have > to reindex on ICU upgrades as well. Agreed that it doesn't solve the whole drama, but reindexes are not needed if container images for upgrades are provided while keeping the ICU version constant (which is doable). Álvaro
On Thu, 9 Jan 2025 17:06:57 +0100 Álvaro Hernández <aht@ongres.com> wrote: > On 9/1/25 10:07, Christoph Berg wrote: > > Re: Jeremy Schneider > >> I'm wondering if there might be any support for providing a > >> "postgresql-slim" package on PGDG which excludes llvm and python? I > >> think this might almost cut the total install size in half, and I > >> think there might be many users who would value having the option. > >> > > Hi, > > > > could you explain why 250 MB is too much? Disk space these days is > > ultra cheap > > Hi Christoph. > > Container images allow (are meant to) contain only the necessary > files needed to run the process that will be run when the image is > run. As such, any additional file poses two main problems: > > * Disk space is cheap. Bandwidth not so much. Time to start a > > * Security analysis. Unneeded files (specially binaries, but not Another concern is the impact of image rebuilds as dependencies are updated. Tianon (a primary maintainer of the docker images) has noted that they limit frequency of the debian base containers, because every rebuild of the base container triggers an avalance of downstream rebuilds. CNPG was doing daily rebuilds for awhile, and every time any python dependency was updated you'd get a new image - boto3 was notorious for very frequent updates. So with a different image version for every day, a single server running multiple copies of postgres might easily end up with multiple image versions on the server as copies are slowly updated. > > > and removing functionality (query JITing) does have cost > > as well. > > If it can be made optional, then users can decide whether they > want container images with this functionality or not. To be clear, I definitely don't want the "default" postgres packages to not have JIT. I was just suggesting a non-default "slim" alternative. Honestly I don't know if this is going to introduce a bunch of complexity in dependency management between debian packages, and how feasible it would be actually do it... but wanted to ask the question and raise the topic. > >> Even though ICU is a larger package, I would argue for still > >> including it in a "slim" build. Because of the drama around glibc > >> collation I view ICU as especially important to make available. > > Note that ICU does not fix the collation drama either, you will have > > to reindex on ICU upgrades as well. > > Agreed that it doesn't solve the whole drama, but reindexes are > not needed if container images for upgrades are provided while > keeping the ICU version constant (which is doable). Yes, I'm definitely well aware of how ICU isn't really changing anything about rebuild requirement - I've said many times that people should default to builtin C collation starting with pg17, and set linguistic collation at a table or query level. The big advantage of this is that it's much easier to know everything that needs rebuilding, since postgres does good dependency tracking of objects using nondefault collation. But with ICU there is at least the option that someone could rebuild an old version and run it on the new debian release. That's nearly impossible with glibc. -Jeremy
On 9/1/25 18:08, Jeremy Schneider wrote:
On Thu, 9 Jan 2025 17:06:57 +0100 Álvaro Hernández <aht@ongres.com> wrote:On 9/1/25 10:07, Christoph Berg wrote:Re: Jeremy SchneiderI'm wondering if there might be any support for providing a "postgresql-slim" package on PGDG which excludes llvm and python? I think this might almost cut the total install size in half, and I think there might be many users who would value having the option.Hi, could you explain why 250 MB is too much? Disk space these days is ultra cheapHi Christoph. Container images allow (are meant to) contain only the necessary files needed to run the process that will be run when the image is run. As such, any additional file poses two main problems: * Disk space is cheap. Bandwidth not so much. Time to start a * Security analysis. Unneeded files (specially binaries, but notAnother concern is the impact of image rebuilds as dependencies are updated. Tianon (a primary maintainer of the docker images) has noted that they limit frequency of the debian base containers, because every rebuild of the base container triggers an avalance of downstream rebuilds. CNPG was doing daily rebuilds for awhile, and every time any python dependency was updated you'd get a new image - boto3 was notorious for very frequent updates. So with a different image version for every day, a single server running multiple copies of postgres might easily end up with multiple image versions on the server as copies are slowly updated.
I see this as a symptom of a different, bigger issue: that package versions, and all transitive dependencies, should be version pinned when building container images. I haven't seen too many examples of taking the effort to do this. But it's the only way to have a way to re-run building images and guarantee outputs that are reproducible. Once you have this in place, you can decide how and when you upgrade which versions.
Actually, even version pinning is not enough, unless the package system guarantees that a version of a package is strictly immutable (and AFAIK this is usually not the case). So digest pinning is essentially required.
But with ICU there is at least the option that someone could rebuild an old version and run it on the new debian release. That's nearly impossible with glibc.
Exactly, and this is doable.
Álvaro
-- Alvaro Hernandez ----------- OnGres
On Thu, Jan 9, 2025 at 11:40 PM Álvaro Hernández <aht@ongres.com> wrote:
On 9/1/25 18:08, Jeremy Schneider wrote:On Thu, 9 Jan 2025 17:06:57 +0100 Álvaro Hernández <aht@ongres.com> wrote:On 9/1/25 10:07, Christoph Berg wrote:Re: Jeremy SchneiderI'm wondering if there might be any support for providing a "postgresql-slim" package on PGDG which excludes llvm and python? I think this might almost cut the total install size in half, and I think there might be many users who would value having the option.Hi, could you explain why 250 MB is too much? Disk space these days is ultra cheapHi Christoph. Container images allow (are meant to) contain only the necessary files needed to run the process that will be run when the image is run. As such, any additional file poses two main problems: * Disk space is cheap. Bandwidth not so much. Time to start a * Security analysis. Unneeded files (specially binaries, but notAnother concern is the impact of image rebuilds as dependencies are updated. Tianon (a primary maintainer of the docker images) has noted that they limit frequency of the debian base containers, because every rebuild of the base container triggers an avalance of downstream rebuilds. CNPG was doing daily rebuilds for awhile, and every time any python dependency was updated you'd get a new image - boto3 was notorious for very frequent updates. So with a different image version for every day, a single server running multiple copies of postgres might easily end up with multiple image versions on the server as copies are slowly updated.
I see this as a symptom of a different, bigger issue: that package versions, and all transitive dependencies, should be version pinned when building container images. I haven't seen too many examples of taking the effort to do this. But it's the only way to have a way to re-run building images and guarantee outputs that are reproducible. Once you have this in place, you can decide how and when you upgrade which versions.
I'm guessing most container builders are just not interested in doing that much work. It's easier to just "always upgrade", but as noted that comes with a whole different set of problems. It's only really feasible if you manage to first reduce the set of dependencies substantially.
Actually, even version pinning is not enough, unless the package system guarantees that a version of a package is strictly immutable (and AFAIK this is usually not the case). So digest pinning is essentially required.
Debian (as this was talking about it) is actually doing a very good job ot that these days, though they're not there all the way. But https://tests.reproducible-builds.org/debian/reproducible.htmlshows they're doing really well.
On 10/1/25 10:52, Magnus Hagander wrote:
On Thu, Jan 9, 2025 at 11:40 PM Álvaro Hernández <aht@ongres.com> wrote:On 9/1/25 18:08, Jeremy Schneider wrote:On Thu, 9 Jan 2025 17:06:57 +0100 Álvaro Hernández <aht@ongres.com> wrote:On 9/1/25 10:07, Christoph Berg wrote:Re: Jeremy SchneiderI'm wondering if there might be any support for providing a "postgresql-slim" package on PGDG which excludes llvm and python? I think this might almost cut the total install size in half, and I think there might be many users who would value having the option.Hi, could you explain why 250 MB is too much? Disk space these days is ultra cheapHi Christoph. Container images allow (are meant to) contain only the necessary files needed to run the process that will be run when the image is run. As such, any additional file poses two main problems: * Disk space is cheap. Bandwidth not so much. Time to start a * Security analysis. Unneeded files (specially binaries, but notAnother concern is the impact of image rebuilds as dependencies are updated. Tianon (a primary maintainer of the docker images) has noted that they limit frequency of the debian base containers, because every rebuild of the base container triggers an avalance of downstream rebuilds. CNPG was doing daily rebuilds for awhile, and every time any python dependency was updated you'd get a new image - boto3 was notorious for very frequent updates. So with a different image version for every day, a single server running multiple copies of postgres might easily end up with multiple image versions on the server as copies are slowly updated.
I see this as a symptom of a different, bigger issue: that package versions, and all transitive dependencies, should be version pinned when building container images. I haven't seen too many examples of taking the effort to do this. But it's the only way to have a way to re-run building images and guarantee outputs that are reproducible. Once you have this in place, you can decide how and when you upgrade which versions.I'm guessing most container builders are just not interested in doing that much work. It's easier to just "always upgrade", but as noted that comes with a whole different set of problems. It's only really feasible if you manage to first reduce the set of dependencies substantially.
Yes, it comes with a whole set of problems. The main one, other than upgrades, is that you may end up with inconsistent environments: cases where not all images deployed are the same because some dependencies have different versions. This may also lead to different CVEs present on different servers. This if far from ideal and a problem that is starting to be more and more visible.
While container builders may not be interested in doing all this work, I think that it should be done regardless. And over time, it will be done more and more. When security and supply-chain attacks are a serious concern, precise knowledge of your dependencies is key.
Actually, even version pinning is not enough, unless the package system guarantees that a version of a package is strictly immutable (and AFAIK this is usually not the case). So digest pinning is essentially required.Debian (as this was talking about it) is actually doing a very good job ot that these days, though they're not there all the way. But https://tests.reproducible-builds.org/debian/reproducible.htmlshows they're doing really well.
Debian is doing a great job towards reproducibility of the build efforts of their packages. However, AFAIK a given package version can be updated with a different content --and that's why a service like https://snapshot.debian.org exists.
Álvaro
-- Alvaro Hernandez ----------- OnGres
Re: Álvaro Hernández > Debian is doing a great job towards reproducibility of the build efforts > of their packages. However, AFAIK a given package version can be updated > with a different content --and that's why a service like > https://snapshot.debian.org exists. That will never happen, new packages always have new version/revision numbers. Same on apt.postgresql.org. Christoph
On 10/01/2025 10:52, Magnus Hagander wrote:
On Thu, Jan 9, 2025 at 11:40 PM Álvaro Hernández <aht@ongres.com> wrote:On 9/1/25 18:08, Jeremy Schneider wrote:On Thu, 9 Jan 2025 17:06:57 +0100 Álvaro Hernández <aht@ongres.com> wrote:On 9/1/25 10:07, Christoph Berg wrote:Re: Jeremy SchneiderI'm wondering if there might be any support for providing a "postgresql-slim" package on PGDG which excludes llvm and python? I think this might almost cut the total install size in half, and I think there might be many users who would value having the option.Hi, could you explain why 250 MB is too much? Disk space these days is ultra cheapHi Christoph. Container images allow (are meant to) contain only the necessary files needed to run the process that will be run when the image is run. As such, any additional file poses two main problems: * Disk space is cheap. Bandwidth not so much. Time to start a * Security analysis. Unneeded files (specially binaries, but notAnother concern is the impact of image rebuilds as dependencies are updated. Tianon (a primary maintainer of the docker images) has noted that they limit frequency of the debian base containers, because every rebuild of the base container triggers an avalance of downstream rebuilds. CNPG was doing daily rebuilds for awhile, and every time any python dependency was updated you'd get a new image - boto3 was notorious for very frequent updates. So with a different image version for every day, a single server running multiple copies of postgres might easily end up with multiple image versions on the server as copies are slowly updated.
I see this as a symptom of a different, bigger issue: that package versions, and all transitive dependencies, should be version pinned when building container images. I haven't seen too many examples of taking the effort to do this. But it's the only way to have a way to re-run building images and guarantee outputs that are reproducible. Once you have this in place, you can decide how and when you upgrade which versions.I'm guessing most container builders are just not interested in doing that much work. It's easier to just "always upgrade", but as noted that comes with a whole different set of problems. It's only really feasible if you manage to first reduce the set of dependencies substantially.
Actually, even version pinning is not enough, unless the package system guarantees that a version of a package is strictly immutable (and AFAIK this is usually not the case). So digest pinning is essentially required.Debian (as this was talking about it) is actually doing a very good job ot that these days, though they're not there all the way. But https://tests.reproducible-builds.org/debian/reproducible.htmlshows they're doing really well.
Also on debian.net : https://amd64.reproduce.debian.net/#postgresql-17 for "non fancy" webpage.
There was a talk on this very topic, at minidebconf recently (by kpcyrd):
https://toulouse2024.mini.debconf.org/talks/4-reproducible-builds-rebuilding-what-is-distributed-from-ftpdebianorg/
"Since about a month we’ve also been rebuilding trying to exactly match the builds being distributed via ftp.d.o - this talk will describe the setup and the lessons learned so far, and why the results currently are what they are (spoiler: less <30% reproducible) and what we can do to fix that."
And rebuilderd is surely of interest for people willing to work on reproducible builds: https://github.com/kpcyrd/rebuilderd
--- Cédric Villemain +33 6 20 30 22 52 https://www.Data-Bene.io PostgreSQL Support, Expertise, Training, R&D