Обсуждение: deb package sizes

Поиск
Список
Период
Сортировка

deb package sizes

От
Jeremy Schneider
Дата:
Hello, I hope I found a good mailing list for this topic?

Recently, I've been spending some time looking at the official Postgres
docker images. https://hub.docker.com/_/postgres/

I think there are a lot of people using these to quickly spin up a
postgres database for testing on their local dev machine. Right now,
they are also the base image used for building CloudNativePG production
postgres images.

These official docker images are a repackaging of the PGDG debian
packages, combined with a minimal set of debian OS packages. Docker
images are built using both debian stable and debian oldstable branches
(with tags like "17.1-bookworm" and "17.1-bullseye").

With docker images, we like to get the container images to be as
minimal and small as possible. I have spent a little time looking at the
make-up of the official docker images from a size perspective, which is
driven by debian package sizes.

Before adding any PGDG postgres packages or dependencies, our base OS
container image is 74MB and includes about 88 debian packages.

We install only 5 PGDG postgres packages: postgresql,
postgresql-client, postgresql-client-common, postgresql-common and
libpq5. The "common" packages are tiny, libpq is 1MB, client is 10MB
and the postgresql package itself is 60MB.

What's more interesting is all of the additional dependencies that the
postgresql package pulls in: an extra 53 debian packages that are over
250MB in total size.

The biggest size contributors are libllvm & libz3 (143MB), libperl &
perl-modules (45MB total) and libicu (36MB). These three things alone
make up 64% of the total postgres-specific bytes.

I'm wondering if there might be any support for providing a
"postgresql-slim" package on PGDG which excludes llvm and python? I
think this might almost cut the total install size in half, and I think
there might be many users who would value having the option.

Even though ICU is a larger package, I would argue for still
including it in a "slim" build. Because of the drama around glibc
collation I view ICU as especially important to make available.

Interested to know others' thoughts about having a slimmer package.

Thanks,
Jeremy Schneider



PS. here are the commands I used to get the sizes (apologies that the
formatting isn't great) and the full list of postgresql-specific
packages


docker run --rm debian:bookworm-slim dpkg-query --show
--showformat='${Package}\t${Installed-Size} KB\n' > base-pkgs


docker run --rm postgres:17-bookworm dpkg-query --show
--showformat='${Package}\t${Installed-Size} KB\n' > pg-pkgs


docker run --rm postgres:17-bookworm apt rdepends libz3-4

libz3-4
Reverse Depends:
  Depends: libllvm16 (>= 4.8.12)


diff -b base-pkgs pg-pkgs |grep '^>'|sort -k3 -n |
 awk '{total+=$3;printf "%-30s %s",$0,
   "| running total size: "total
   " KB | running total percentage: "total*100/355572"%\n"}'

netbase       36 KB                | running total size: 36 KB
  | running total percentage: 0.0101245%

libkeyutils1  40 KB           | running total size: 76 KB
  | running total percentage: 0.021374%

libnpth0      50 KB               | running total size: 126 KB
  | running total percentage: 0.0354359%

sensible-utils        56 KB    | running total size: 182 KB
  | running total percentage: 0.0511851%

ssl-cert      64 KB               | running total size: 246 KB
  | running total percentage: 0.0691843%

libgdbm-compat4 70 KB        | running total size: 316 KB
  | running total percentage: 0.0888709%

libsasl2-modules-db   77 KB    | running total size: 393 KB
  | running total percentage: 0.110526%

readline-common     89 KB        | running total size: 482 KB
  | running total percentage: 0.135556%

libnss-wrapper        99 KB         | running total size: 581 KB
  | running total percentage: 0.163399%

libio-pty-perl        103 KB        | running total size: 684 KB
  | running total percentage: 0.192366%

libassuan0    117 KB  | running total size: 801 KB
  | running total percentage: 0.225271%

libgdbm6      129 KB              | running total size: 930 KB
  | running total percentage: 0.26155%

libkrb5support0       133 KB | running total size: 1063 KB
  | running total percentage: 0.298955%

postgresql-client-common      133 KB | running total size: 1196 KB
  | running total percentage: 0.336359%

pinentry-curses       140 KB  | running total size: 1336 KB
  | running total percentage: 0.375733%

libsasl2-2    167 KB            | running total size: 1503 KB
  | running total percentage: 0.422699%

libbsd0       202 KB  | running total size: 1705 KB
  | running total percentage: 0.479509%

ucf   214 KB                   | running total size: 1919 KB
  | running total percentage: 0.539694%

libjson-perl  244 KB          | running total size: 2163 KB
  | running total percentage: 0.608316%

libedit2      258 KB              | running total size: 2421 KB
  | running total percentage: 0.680875%

libk5crypto3  260 KB          | running total size: 2681 KB
  | running total percentage: 0.753996%

libipc-run-perl       267 KB       | running total size: 2948 KB
  | running total percentage: 0.829087%

less  313 KB                  | running total size: 3261 KB
  | running total percentage: 0.917114%

libksba8      316 KB              | running total size: 3577 KB
  | running total percentage: 1.00598%

libncursesw6  412 KB          | running total size: 3989 KB
  | running total percentage: 1.12185%

libgssapi-krb5-2      424 KB      | running total size: 4413 KB
  | running total percentage: 1.2411%

libreadline8  475 KB          | running total size: 4888 KB
  | running total percentage: 1.37469%

libxslt1.1    504 KB            | running total size: 5392 KB
  | running total percentage: 1.51643%

libldap-2.5-0 553 KB         | running total size: 5945 KB
  | running total percentage: 1.67195%

gpg-wks-server        657 KB        | running total size: 6602 KB
  | running total percentage: 1.85673%

postgresql-common     667 KB  | running total size: 7269 KB
  | running total percentage: 2.04431%

perl  669 KB                  | running total size: 7938 KB
  | running total percentage: 2.23246%

gpg-wks-client        682 KB  | running total size: 8620 KB
  | running total percentage: 2.42426%

gpgconf       803 KB               | running total size: 9423 KB
  | running total percentage: 2.6501%

gnupg 885 KB                 | running total size: 10308 KB
  | running total percentage: 2.89899%

gpgsm 992 KB                 | running total size: 11300 KB
  | running total percentage: 3.17798%

libpq5        1068 KB  | running total size: 12368 KB
  | running total percentage: 3.47834%

libkrb5-3     1076 KB            | running total size: 13444 KB
  | running total percentage: 3.78095%

xz-utils      1226 KB | running total size: 14670 KB
  | running total percentage: 4.12575%

dirmngr       1328 KB              | running total size: 15998 KB
  | running total percentage: 4.49923%

gpg-agent     1348 KB  | running total size: 17346 KB
  | running total percentage: 4.87834%

gpg   1581 KB                  | running total size: 18927 KB
  | running total percentage: 5.32297%

libsqlite3-0  1682 KB         | running total size: 20609 KB
  | running total percentage: 5.79601%

gnupg-utils   1836 KB          | running total size: 22445 KB
  | running total percentage: 6.31236%

libxml2       1866 KB | running total size: 24311 KB
  | running total percentage: 6.83715%

zstd  2102 KB                 | running total size: 26413 KB
  | running total percentage: 7.42831%

openssl       2296 KB | running total size: 28709 KB
  | running total percentage: 8.07403%

libc-l10n     4348 KB            | running total size: 33057 KB
  | running total percentage: 9.29685%

gnupg-l10n    4874 KB           | running total size: 37931 KB
  | running total percentage: 10.6676%

libssl3       6021 KB              | running total size: 43952 KB
  | running total percentage: 12.3609%

postgresql-client-17  9947 KB | running total size: 53899 KB
  | running total percentage: 15.1584%

locales       15845 KB             | running total size: 69744 KB
  | running total percentage: 19.6146%

perl-modules-5.36     17816 KB  | running total size: 87560 KB
  | running total percentage: 24.6251%

libz3-4       22767 KB             | running total size: 110327 KB
  | running total percentage: 31.028%

libperl5.36   28862 KB         | running total size: 139189 KB
  | running total percentage: 39.1451%

libicu72      36170 KB            | running total size: 175359 KB
  | running total percentage: 49.3174%

postgresql-17 59671 KB       | running total size: 235030 KB
  | running total percentage: 66.0991%

libllvm16     120542 KB          | running total size: 355572 KB
  | running total percentage: 100%




Re: deb package sizes

От
Christoph Berg
Дата:
Re: Jeremy Schneider
> I'm wondering if there might be any support for providing a
> "postgresql-slim" package on PGDG which excludes llvm and python? I
> think this might almost cut the total install size in half, and I think
> there might be many users who would value having the option.

Hi,

could you explain why 250 MB is too much? Disk space these days is
ultra cheap and removing functionality (query JITing) does have cost
as well.

> Even though ICU is a larger package, I would argue for still
> including it in a "slim" build. Because of the drama around glibc
> collation I view ICU as especially important to make available.

Note that ICU does not fix the collation drama either, you will have
to reindex on ICU upgrades as well.

Christoph



Re: deb package sizes

От
Álvaro Hernández
Дата:

On 9/1/25 10:07, Christoph Berg wrote:
> Re: Jeremy Schneider
>> I'm wondering if there might be any support for providing a
>> "postgresql-slim" package on PGDG which excludes llvm and python? I
>> think this might almost cut the total install size in half, and I think
>> there might be many users who would value having the option.
> Hi,
>
> could you explain why 250 MB is too much? Disk space these days is
> ultra cheap

     Hi Christoph.

     Container images allow (are meant to) contain only the necessary 
files needed to run the process that will be run when the image is run. 
As such, any additional file poses two main problems:

* Disk space is cheap. Bandwidth not so much. Time to start a container 
may have a notable cost. Making container images slimmer helps in all 
these dimensions. When you run the same container image in many places, 
with high frequency, and end up pulling it multiple times, it all that 
has a cost. In particular for Postgres, time pulling and running an 
image may affect uptime. So it can become quite important.

* Security analysis. Unneeded files (specially binaries, but not only) 
may lead to container images having (more) security vulnerabilities than 
they could. For many, container images must pass vulnerability analysis 
scans, and the more (unneeded) packages present, the bigger the chances 
are that they may contain vulnerabilities. It's anyway a basic security 
principle, to only contain the files needed to run the files needed, and 
no more.

>   and removing functionality (query JITing) does have cost
> as well.

     If it can be made optional, then users can decide whether they want 
container images with this functionality or not.

>> Even though ICU is a larger package, I would argue for still
>> including it in a "slim" build. Because of the drama around glibc
>> collation I view ICU as especially important to make available.
> Note that ICU does not fix the collation drama either, you will have
> to reindex on ICU upgrades as well.

     Agreed that it doesn't solve the whole drama, but reindexes are not 
needed if container images for upgrades are provided while keeping the 
ICU version constant (which is doable).

     Álvaro




Re: deb package sizes

От
Jeremy Schneider
Дата:
On Thu, 9 Jan 2025 17:06:57 +0100
Álvaro Hernández <aht@ongres.com> wrote:

> On 9/1/25 10:07, Christoph Berg wrote:
> > Re: Jeremy Schneider
> >> I'm wondering if there might be any support for providing a
> >> "postgresql-slim" package on PGDG which excludes llvm and python? I
> >> think this might almost cut the total install size in half, and I
> >> think there might be many users who would value having the option.
> >>
> > Hi,
> >
> > could you explain why 250 MB is too much? Disk space these days is
> > ultra cheap
>
>      Hi Christoph.
>
>      Container images allow (are meant to) contain only the necessary
> files needed to run the process that will be run when the image is
> run. As such, any additional file poses two main problems:
>
> * Disk space is cheap. Bandwidth not so much. Time to start a
>
> * Security analysis. Unneeded files (specially binaries, but not

Another concern is the impact of image rebuilds as dependencies are
updated. Tianon (a primary maintainer of the docker images) has noted
that they limit frequency of the debian base containers, because every
rebuild of the base container triggers an avalance of downstream
rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
python dependency was updated you'd get a new image - boto3 was
notorious for very frequent updates. So with a different image version
for every day, a single server running multiple copies of postgres might
easily end up with multiple image versions on the server as copies are
slowly updated.


>
> >   and removing functionality (query JITing) does have cost
> > as well.
>
>      If it can be made optional, then users can decide whether they
> want container images with this functionality or not.

To be clear, I definitely don't want the "default" postgres packages to
not have JIT. I was just suggesting a non-default "slim" alternative.

Honestly I don't know if this is going to introduce a bunch of
complexity in dependency management between debian packages, and how
feasible it would be actually do it... but wanted to ask the question
and raise the topic.

> >> Even though ICU is a larger package, I would argue for still
> >> including it in a "slim" build. Because of the drama around glibc
> >> collation I view ICU as especially important to make available.
> > Note that ICU does not fix the collation drama either, you will have
> > to reindex on ICU upgrades as well.
>
>      Agreed that it doesn't solve the whole drama, but reindexes are
> not needed if container images for upgrades are provided while
> keeping the ICU version constant (which is doable).

Yes, I'm definitely  well aware of how ICU isn't really changing
anything about rebuild requirement - I've said many times that people
should default to builtin C collation starting with pg17, and set
linguistic collation at a table or query level. The big advantage of
this is that it's much easier to know everything that needs rebuilding,
since postgres does good dependency tracking of objects using nondefault
collation.

But with ICU there is at least the option that someone could rebuild an
old version and run it on the new debian release. That's nearly
impossible with glibc.

-Jeremy



Re: deb package sizes

От
Álvaro Hernández
Дата:


On 9/1/25 18:08, Jeremy Schneider wrote:
On Thu, 9 Jan 2025 17:06:57 +0100
Álvaro Hernández <aht@ongres.com> wrote:

On 9/1/25 10:07, Christoph Berg wrote:
Re: Jeremy Schneider  
I'm wondering if there might be any support for providing a
"postgresql-slim" package on PGDG which excludes llvm and python? I
think this might almost cut the total install size in half, and I
think there might be many users who would value having the option. 
Hi,

could you explain why 250 MB is too much? Disk space these days is
ultra cheap  
     Hi Christoph.
     Container images allow (are meant to) contain only the necessary 
files needed to run the process that will be run when the image is
run. As such, any additional file poses two main problems:

* Disk space is cheap. Bandwidth not so much. Time to start a

* Security analysis. Unneeded files (specially binaries, but not
Another concern is the impact of image rebuilds as dependencies are
updated. Tianon (a primary maintainer of the docker images) has noted
that they limit frequency of the debian base containers, because every
rebuild of the base container triggers an avalance of downstream
rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
python dependency was updated you'd get a new image - boto3 was
notorious for very frequent updates. So with a different image version
for every day, a single server running multiple copies of postgres might
easily end up with multiple image versions on the server as copies are
slowly updated.

    I see this as a symptom of a different, bigger issue: that package versions, and all transitive dependencies, should be version pinned when building container images. I haven't seen too many examples of taking the effort to do this. But it's the only way to have a way to re-run building images and guarantee outputs that are reproducible. Once you have this in place, you can decide how and when you upgrade which versions.

    Actually, even version pinning is not enough, unless the package system guarantees that a version of a package is strictly immutable (and AFAIK this is usually not the case). So digest pinning is essentially required.

But with ICU there is at least the option that someone could rebuild an
old version and run it on the new debian release. That's nearly
impossible with glibc.


    Exactly, and this is doable.


    Álvaro


-- 

Alvaro Hernandez


-----------
OnGres

Re: deb package sizes

От
Magnus Hagander
Дата:
On Thu, Jan 9, 2025 at 11:40 PM Álvaro Hernández <aht@ongres.com> wrote:


On 9/1/25 18:08, Jeremy Schneider wrote:
On Thu, 9 Jan 2025 17:06:57 +0100
Álvaro Hernández <aht@ongres.com> wrote:

On 9/1/25 10:07, Christoph Berg wrote:
Re: Jeremy Schneider  
I'm wondering if there might be any support for providing a
"postgresql-slim" package on PGDG which excludes llvm and python? I
think this might almost cut the total install size in half, and I
think there might be many users who would value having the option. 
Hi,

could you explain why 250 MB is too much? Disk space these days is
ultra cheap  
     Hi Christoph.
     Container images allow (are meant to) contain only the necessary 
files needed to run the process that will be run when the image is
run. As such, any additional file poses two main problems:

* Disk space is cheap. Bandwidth not so much. Time to start a

* Security analysis. Unneeded files (specially binaries, but not
Another concern is the impact of image rebuilds as dependencies are
updated. Tianon (a primary maintainer of the docker images) has noted
that they limit frequency of the debian base containers, because every
rebuild of the base container triggers an avalance of downstream
rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
python dependency was updated you'd get a new image - boto3 was
notorious for very frequent updates. So with a different image version
for every day, a single server running multiple copies of postgres might
easily end up with multiple image versions on the server as copies are
slowly updated.

    I see this as a symptom of a different, bigger issue: that package versions, and all transitive dependencies, should be version pinned when building container images. I haven't seen too many examples of taking the effort to do this. But it's the only way to have a way to re-run building images and guarantee outputs that are reproducible. Once you have this in place, you can decide how and when you upgrade which versions.

I'm guessing most container builders are just not interested in doing that much work. It's easier to just "always upgrade", but as noted that comes with a whole different set of problems. It's only really feasible if you manage to first reduce the set of dependencies substantially.

 

    Actually, even version pinning is not enough, unless the package system guarantees that a version of a package is strictly immutable (and AFAIK this is usually not the case). So digest pinning is essentially required.

Debian (as this was talking about it) is actually doing a very good job ot that these days, though they're not there all the way. But https://tests.reproducible-builds.org/debian/reproducible.htmlshows they're doing really well. 


--

Re: deb package sizes

От
Álvaro Hernández
Дата:


On 10/1/25 10:52, Magnus Hagander wrote:
On Thu, Jan 9, 2025 at 11:40 PM Álvaro Hernández <aht@ongres.com> wrote:


On 9/1/25 18:08, Jeremy Schneider wrote:
On Thu, 9 Jan 2025 17:06:57 +0100
Álvaro Hernández <aht@ongres.com> wrote:

On 9/1/25 10:07, Christoph Berg wrote:
Re: Jeremy Schneider  
I'm wondering if there might be any support for providing a
"postgresql-slim" package on PGDG which excludes llvm and python? I
think this might almost cut the total install size in half, and I
think there might be many users who would value having the option. 
Hi,

could you explain why 250 MB is too much? Disk space these days is
ultra cheap  
     Hi Christoph.
     Container images allow (are meant to) contain only the necessary 
files needed to run the process that will be run when the image is
run. As such, any additional file poses two main problems:

* Disk space is cheap. Bandwidth not so much. Time to start a

* Security analysis. Unneeded files (specially binaries, but not
Another concern is the impact of image rebuilds as dependencies are
updated. Tianon (a primary maintainer of the docker images) has noted
that they limit frequency of the debian base containers, because every
rebuild of the base container triggers an avalance of downstream
rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
python dependency was updated you'd get a new image - boto3 was
notorious for very frequent updates. So with a different image version
for every day, a single server running multiple copies of postgres might
easily end up with multiple image versions on the server as copies are
slowly updated.

    I see this as a symptom of a different, bigger issue: that package versions, and all transitive dependencies, should be version pinned when building container images. I haven't seen too many examples of taking the effort to do this. But it's the only way to have a way to re-run building images and guarantee outputs that are reproducible. Once you have this in place, you can decide how and when you upgrade which versions.

I'm guessing most container builders are just not interested in doing that much work. It's easier to just "always upgrade", but as noted that comes with a whole different set of problems. It's only really feasible if you manage to first reduce the set of dependencies substantially.

    Yes, it comes with a whole set of problems. The main one, other than upgrades, is that you may end up with inconsistent environments: cases where not all images deployed are the same because some dependencies have different versions. This may also lead to different CVEs present on different servers. This if far from ideal and a problem that is starting to be more and more visible.

    While container builders may not be interested in doing all this work, I think that it should be done regardless. And over time, it will be done more and more. When security and supply-chain attacks are a serious concern, precise knowledge of your dependencies is key.


 

    Actually, even version pinning is not enough, unless the package system guarantees that a version of a package is strictly immutable (and AFAIK this is usually not the case). So digest pinning is essentially required.

Debian (as this was talking about it) is actually doing a very good job ot that these days, though they're not there all the way. But https://tests.reproducible-builds.org/debian/reproducible.htmlshows they're doing really well.

    Debian is doing a great job towards reproducibility of the build efforts of their packages. However, AFAIK a given package version can be updated with a different content --and that's why a service like https://snapshot.debian.org exists.


    Álvaro

-- 

Alvaro Hernandez


-----------
OnGres

Re: deb package sizes

От
Christoph Berg
Дата:
Re: Álvaro Hernández
>     Debian is doing a great job towards reproducibility of the build efforts
> of their packages. However, AFAIK a given package version can be updated
> with a different content --and that's why a service like
> https://snapshot.debian.org exists.

That will never happen, new packages always have new version/revision numbers.
Same on apt.postgresql.org.

Christoph



Re: deb package sizes

От
Cédric Villemain
Дата:


On 10/01/2025 10:52, Magnus Hagander wrote:
On Thu, Jan 9, 2025 at 11:40 PM Álvaro Hernández <aht@ongres.com> wrote:


On 9/1/25 18:08, Jeremy Schneider wrote:
On Thu, 9 Jan 2025 17:06:57 +0100
Álvaro Hernández <aht@ongres.com> wrote:

On 9/1/25 10:07, Christoph Berg wrote:
Re: Jeremy Schneider  
I'm wondering if there might be any support for providing a
"postgresql-slim" package on PGDG which excludes llvm and python? I
think this might almost cut the total install size in half, and I
think there might be many users who would value having the option. 
Hi,

could you explain why 250 MB is too much? Disk space these days is
ultra cheap  
     Hi Christoph.
     Container images allow (are meant to) contain only the necessary 
files needed to run the process that will be run when the image is
run. As such, any additional file poses two main problems:

* Disk space is cheap. Bandwidth not so much. Time to start a

* Security analysis. Unneeded files (specially binaries, but not
Another concern is the impact of image rebuilds as dependencies are
updated. Tianon (a primary maintainer of the docker images) has noted
that they limit frequency of the debian base containers, because every
rebuild of the base container triggers an avalance of downstream
rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
python dependency was updated you'd get a new image - boto3 was
notorious for very frequent updates. So with a different image version
for every day, a single server running multiple copies of postgres might
easily end up with multiple image versions on the server as copies are
slowly updated.

    I see this as a symptom of a different, bigger issue: that package versions, and all transitive dependencies, should be version pinned when building container images. I haven't seen too many examples of taking the effort to do this. But it's the only way to have a way to re-run building images and guarantee outputs that are reproducible. Once you have this in place, you can decide how and when you upgrade which versions.

I'm guessing most container builders are just not interested in doing that much work. It's easier to just "always upgrade", but as noted that comes with a whole different set of problems. It's only really feasible if you manage to first reduce the set of dependencies substantially.

 

    Actually, even version pinning is not enough, unless the package system guarantees that a version of a package is strictly immutable (and AFAIK this is usually not the case). So digest pinning is essentially required.

Debian (as this was talking about it) is actually doing a very good job ot that these days, though they're not there all the way. But https://tests.reproducible-builds.org/debian/reproducible.htmlshows they're doing really well.


Also on debian.net : https://amd64.reproduce.debian.net/#postgresql-17 for "non fancy" webpage.


There was a talk on this very topic, at minidebconf recently (by kpcyrd):

 https://toulouse2024.mini.debconf.org/talks/4-reproducible-builds-rebuilding-what-is-distributed-from-ftpdebianorg/
"Since about a month we’ve also been rebuilding trying to exactly match the builds being distributed via ftp.d.o - this talk will describe the setup and the lessons learned so far, and why the results currently are what they are (spoiler: less <30% reproducible) and what we can do to fix that."

And rebuilderd is surely of interest for people willing to work on reproducible builds: https://github.com/kpcyrd/rebuilderd

 

---
Cédric Villemain +33 6 20 30 22 52
https://www.Data-Bene.io
PostgreSQL Support, Expertise, Training, R&D