Обсуждение: make dist using git archive

Поиск
Список
Период
Сортировка

make dist using git archive

От
Peter Eisentraut
Дата:
One of the goals is to make the creation of the distribution tarball 
more directly traceable to the git repository.  That is why we removed 
the "make distprep" step.

Here I want to take another step in that direction, by changing "make 
dist" to directly use "git archive", rather than the custom shell script 
it currently runs.

The simple summary is that it would run

git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o 
postgresql-17devel.tar.gz

(with appropriate version numbers, of course), and that's the tarball we 
would ship.

There are analogous commands for other compression formats.

The actual command gets subtly more complicated if you need to run this 
in a separate build directory.  In my attached patch, the make version 
doesn't support vpath at the moment, just so that it's easier to 
understand for now.  The meson version is a bit hairy.

I have studied and tested this quite a bit, and I have found that the 
archives produced this way are deterministic and reproducible, meaning 
for a given commit the result file should always be bit-for-bit identical.

The exception is that if you use a git version older than 2.38.0, gzip 
records the platform in the archive, so you'd get a different output on 
Windows vs. macOS vs. "UNIX" (everything else).  In git 2.38.0, this was 
changed so that everything is recorded as "UNIX" now.  This is just 
something to keep in mind.  This issue is specific to the gzip format, 
it does not affect other compression formats.

Meson has its own distribution building command (meson dist), but opted 
against using that.  The main problem is that the way they have 
implemented it, it is not deterministic in the above sense.  (Another 
point is of course that we probably want a "make" version for the time 
being.)

But the target name "dist" in meson is reserved for that reason, so I 
needed to call the custom target "pgdist".

I did take one idea from meson: It runs a check before git archive that 
the checkout is clean.  That way, you avoid mistakes because of 
uncommitted changes.  This works well in my "make" implementation.  In 
the meson implementation, I had to find a workaround, because a 
custom_target cannot have a dependency on a run_target.  As also 
mentioned above, the whole meson implementation is a bit ugly.

Anyway,  with the attached patch you can do

     make dist

or

     meson compile -C build pgdist

and it produces the same set of tarballs as before, except it's done 
differently.

The actual build scripts need some fine-tuning, but the general idea is 
correct, I think.
Вложения

Re: make dist using git archive

От
Junwang Zhao
Дата:
Hi,

On Mon, Jan 22, 2024 at 3:32 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> One of the goals is to make the creation of the distribution tarball
> more directly traceable to the git repository.  That is why we removed
> the "make distprep" step.
>
> Here I want to take another step in that direction, by changing "make
> dist" to directly use "git archive", rather than the custom shell script
> it currently runs.
>
> The simple summary is that it would run
>
> git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
> postgresql-17devel.tar.gz
>
> (with appropriate version numbers, of course), and that's the tarball we
> would ship.
>
> There are analogous commands for other compression formats.
>
> The actual command gets subtly more complicated if you need to run this
> in a separate build directory.  In my attached patch, the make version
> doesn't support vpath at the moment, just so that it's easier to
> understand for now.  The meson version is a bit hairy.
>
> I have studied and tested this quite a bit, and I have found that the
> archives produced this way are deterministic and reproducible, meaning
> for a given commit the result file should always be bit-for-bit identical.
>
> The exception is that if you use a git version older than 2.38.0, gzip
> records the platform in the archive, so you'd get a different output on
> Windows vs. macOS vs. "UNIX" (everything else).  In git 2.38.0, this was
> changed so that everything is recorded as "UNIX" now.  This is just
> something to keep in mind.  This issue is specific to the gzip format,
> it does not affect other compression formats.
>
> Meson has its own distribution building command (meson dist), but opted
> against using that.  The main problem is that the way they have
> implemented it, it is not deterministic in the above sense.  (Another
> point is of course that we probably want a "make" version for the time
> being.)
>
> But the target name "dist" in meson is reserved for that reason, so I
> needed to call the custom target "pgdist".
>
> I did take one idea from meson: It runs a check before git archive that
> the checkout is clean.  That way, you avoid mistakes because of
> uncommitted changes.  This works well in my "make" implementation.  In
> the meson implementation, I had to find a workaround, because a
> custom_target cannot have a dependency on a run_target.  As also
> mentioned above, the whole meson implementation is a bit ugly.
>
> Anyway,  with the attached patch you can do
>
>      make dist
>
> or
>
>      meson compile -C build pgdist

I played this with meson build on macOS, the packages are generated
in source root but not build root, I'm sure if this is by design but I think
polluting *working directory* is not good.

Another thing I'd like to point out is, should we also introduce *git commit*
or maybe *git tag* to package name, something like:

git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
postgresql-17devel-`git rev-parse --short HEAD`.tar.gz
git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
postgresql-`git describe --tags`.tar.gz

>
> and it produces the same set of tarballs as before, except it's done
> differently.
>
> The actual build scripts need some fine-tuning, but the general idea is
> correct, I think.

I think this is a good idea, thanks for working on this.


--
Regards
Junwang Zhao



Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 22.01.24 13:10, Junwang Zhao wrote:
> I played this with meson build on macOS, the packages are generated
> in source root but not build root, I'm sure if this is by design but I think
> polluting *working directory* is not good.

Yes, it's not good, but I couldn't find a way to make it work.

This is part of the complications with meson I referred to.  The 
@BUILD_ROOT@ placeholder in custom_target() is apparently always a 
relative path, but it doesn't know that git -C changes the current 
directory.

> Another thing I'd like to point out is, should we also introduce *git commit*
> or maybe *git tag* to package name, something like:
> 
> git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
> postgresql-17devel-`git rev-parse --short HEAD`.tar.gz
> git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
> postgresql-`git describe --tags`.tar.gz

I'm not sure why we would need it built-in.  It can be done by hand, of 
course.




Re: make dist using git archive

От
"Tristan Partin"
Дата:
On Mon Jan 22, 2024 at 1:31 AM CST, Peter Eisentraut wrote:
> From 4b128faca90238d0a0bb6949a8050c2501d1bd67 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter@eisentraut.org>
> Date: Sat, 20 Jan 2024 21:54:36 +0100
> Subject: [PATCH v0] make dist uses git archive
>
> ---
>  GNUmakefile.in | 34 ++++++++++++----------------------
>  meson.build    | 38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+), 22 deletions(-)
>
> diff --git a/GNUmakefile.in b/GNUmakefile.in
> index eba569e930e..3e04785ada2 100644
> --- a/GNUmakefile.in
> +++ b/GNUmakefile.in
> @@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers submake-libpgport
>  distdir    = postgresql-$(VERSION)
>  dummy    = =install=
>
> +GIT = git
> +
>  dist: $(distdir).tar.gz $(distdir).tar.bz2
> -    rm -rf $(distdir)
> -
> -$(distdir).tar: distdir
> -    $(TAR) chf $@ $(distdir)
> -
> -.INTERMEDIATE: $(distdir).tar
> -
> -distdir-location:
> -    @echo $(distdir)
> -
> -distdir:
> -    rm -rf $(distdir)* $(dummy)
> -    for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name .git -prune \) -o -print`; do \
> -      file=`expr X$$x : 'X\./\(.*\)'`; \
> -      if test -d "$(top_srcdir)/$$file" ; then \
> -        mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file";    \
> -      else \
> -        ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
> -          || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
> -      fi || exit; \
> -    done
> -    $(MAKE) -C $(distdir) distclean
> +
> +.PHONY: check-dirty-index
> +check-dirty-index:
> +    $(GIT) diff-index --quiet HEAD
> +
> +$(distdir).tar.gz: check-dirty-index
> +    $(GIT) archive --format tar.gz --prefix $(distdir)/ HEAD -o $@
> +
> +$(distdir).tar.bz2: check-dirty-index
> +    $(GIT) -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ HEAD -o $@
>
>  distcheck: dist
>      rm -rf $(dummy)
> diff --git a/meson.build b/meson.build
> index c317144b6bc..f0d870c5192 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -3347,6 +3347,44 @@ run_target('help',
>
>
>
> +###############################################################
> +# Distribution archive
> +###############################################################
> +
> +git = find_program('git', required: false, native: true, disabler: true)
> +bzip2 = find_program('bzip2', required: false, native: true, disabler: true)

This doesn't need to be a disabler. git is fine as-is. See later
comment. Disablers only work like you are expecting when they are used
like how git is used. Once you call a method like .path(), all bets are
off.

> +distdir = meson.project_name() + '-' + meson.project_version()
> +
> +check_dirty_index = run_target('check-dirty-index',
> +                               command: [git, 'diff-index', '--quiet', 'HEAD'])

Seems like you might want to add -C here too?

> +
> +tar_gz = custom_target('tar.gz',
> +  build_always_stale: true,
> +  command: [git, '-C', '@SOURCE_ROOT@', 'archive',
> +            '--format', 'tar.gz',
> +            '--prefix', distdir + '/',
> +            '-o', '@BUILD_ROOT@/@OUTPUT@',
> +            'HEAD', '.'],
> +  install: false,
> +  output: distdir + '.tar.gz',
> +)
> +
> +tar_bz2 = custom_target('tar.bz2',
> +  build_always_stale: true,
> +  command: [git, '-C', '@SOURCE_ROOT@', '-c', 'tar.tar.bz2.command=' + bzip2.path() + ' -c', 'archive',
> +            '--format', 'tar.bz2',
> +            '--prefix', distdir + '/',

-            '-o', '@BUILD_ROOT@/@OUTPUT@',
+            '-o', join_paths(meson.build_root(), '@OUTPUT@'),

This will generate the tarballs in the build directory. Do the same for
the previous target. Tested locally.

> +            'HEAD', '.'],
> +  install: false,
> +  output: distdir + '.tar.bz2',
> +)

The bz2 target should be wrapped in an `if bzip2.found()`. It is
possible for git to be found, but not bzip2. I might also define the bz2
command out of line. Also, you may want to add
these programs to meson_options.txt for overriding, even though the
"meson-ic" way is to use a machine file.

> +
> +alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2])

Are you intending for the check_dirty_index target to prohibit the other
two targets from running? Currently that is not the case. If it is what
you intend, use a stamp file or something to indicate a relationship.
Alternatively, inline the git diff-index into the other commands. These
might also do better as external scripts. It would reduce duplication
between the autotools and Meson builds.

> +
> +
> +
>  ###############################################################
>  # The End, The End, My Friend
>  ###############################################################

I am not really following why we can't use the builtin Meson dist
command. The only difference from my testing is it doesn't use
a --prefix argument.

--
Tristan Partin
Neon (https://neon.tech)



Re: make dist using git archive

От
Junwang Zhao
Дата:
On Tue, Jan 23, 2024 at 2:36 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 22.01.24 13:10, Junwang Zhao wrote:
> > I played this with meson build on macOS, the packages are generated
> > in source root but not build root, I'm sure if this is by design but I think
> > polluting *working directory* is not good.
>
> Yes, it's not good, but I couldn't find a way to make it work.
>
> This is part of the complications with meson I referred to.  The
> @BUILD_ROOT@ placeholder in custom_target() is apparently always a
> relative path, but it doesn't know that git -C changes the current
> directory.
>
> > Another thing I'd like to point out is, should we also introduce *git commit*
> > or maybe *git tag* to package name, something like:
> >
> > git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
> > postgresql-17devel-`git rev-parse --short HEAD`.tar.gz
> > git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
> > postgresql-`git describe --tags`.tar.gz
>
> I'm not sure why we would need it built-in.  It can be done by hand, of
> course.

If this is only used by the release phase, one can do this by hand.

*commit id/tag* in package name can be used to identify the git source,
which might be useful for cooperation between QA and dev team,
but surely there are better ways for this, so I do not have a strong
opinion here.

>


--
Regards
Junwang Zhao



Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 22.01.24 21:04, Tristan Partin wrote:
> I am not really following why we can't use the builtin Meson dist 
> command. The only difference from my testing is it doesn't use a 
> --prefix argument.

Here are some problems I have identified:

1. meson dist internally runs gzip without the -n option.  That makes 
the tar.gz archive include a timestamp, which in turn makes it not 
reproducible.

2. Because gzip includes a platform indicator in the archive, the 
produced tar.gz archive is not reproducible across platforms.  (I don't 
know if gzip has an option to avoid that.  git archive uses an internal 
gzip implementation that handles this.)

3. Meson does not support tar.bz2 archives.

4. Meson uses git archive internally, but then unpacks and repacks the 
archive, which loses the ability to use git get-tar-commit-id.

5. I have found that the tar archives created by meson and git archive 
include the files in different orders.  I suspect that the Python 
tarfile module introduces some either randomness or platform dependency.

6. meson dist is also slower because of the additional work.

7. meson dist produces .sha256sum files but we have called them .sha256. 
  (This is obviously trivial, but it is something that would need to be 
dealt with somehow nonetheless.)

Most or all of these issues are fixable, either upstream in Meson or by 
adjusting our own requirements.  But for now this route would have some 
significant disadvantages.




Re: make dist using git archive

От
"Tristan Partin"
Дата:
On Tue Jan 23, 2024 at 3:30 AM CST, Peter Eisentraut wrote:
> On 22.01.24 21:04, Tristan Partin wrote:
> > I am not really following why we can't use the builtin Meson dist
> > command. The only difference from my testing is it doesn't use a
> > --prefix argument.
>
> Here are some problems I have identified:
>
> 1. meson dist internally runs gzip without the -n option.  That makes
> the tar.gz archive include a timestamp, which in turn makes it not
> reproducible.
>
> 2. Because gzip includes a platform indicator in the archive, the
> produced tar.gz archive is not reproducible across platforms.  (I don't
> know if gzip has an option to avoid that.  git archive uses an internal
> gzip implementation that handles this.)
>
> 3. Meson does not support tar.bz2 archives.
>
> 4. Meson uses git archive internally, but then unpacks and repacks the
> archive, which loses the ability to use git get-tar-commit-id.
>
> 5. I have found that the tar archives created by meson and git archive
> include the files in different orders.  I suspect that the Python
> tarfile module introduces some either randomness or platform dependency.
>
> 6. meson dist is also slower because of the additional work.
>
> 7. meson dist produces .sha256sum files but we have called them .sha256.
>   (This is obviously trivial, but it is something that would need to be
> dealt with somehow nonetheless.)
>
> Most or all of these issues are fixable, either upstream in Meson or by
> adjusting our own requirements.  But for now this route would have some
> significant disadvantages.

Thanks Peter. I will bring these up with upstream!

--
Tristan Partin
Neon (https://neon.tech)



Re: make dist using git archive

От
"Tristan Partin"
Дата:
On Wed Jan 24, 2024 at 10:18 AM CST, Tristan Partin wrote:
> On Tue Jan 23, 2024 at 3:30 AM CST, Peter Eisentraut wrote:
> > On 22.01.24 21:04, Tristan Partin wrote:
> > > I am not really following why we can't use the builtin Meson dist
> > > command. The only difference from my testing is it doesn't use a
> > > --prefix argument.
> >
> > Here are some problems I have identified:
> >
> > 1. meson dist internally runs gzip without the -n option.  That makes
> > the tar.gz archive include a timestamp, which in turn makes it not
> > reproducible.

It doesn't look like Python provides the facilities to affect this.

> > 2. Because gzip includes a platform indicator in the archive, the
> > produced tar.gz archive is not reproducible across platforms.  (I don't
> > know if gzip has an option to avoid that.  git archive uses an internal
> > gzip implementation that handles this.)

Same reason as above.

> > 3. Meson does not support tar.bz2 archives.

Submitted https://github.com/mesonbuild/meson/pull/12770.

> > 4. Meson uses git archive internally, but then unpacks and repacks the
> > archive, which loses the ability to use git get-tar-commit-id.

Because Meson allows projects to distribute arbitrary files via
meson.add_dist_script(), and can include subprojects via `meson dist
--include-subprojects`, this doesn't seem like an easily solvable
problem.

> > 5. I have found that the tar archives created by meson and git archive
> > include the files in different orders.  I suspect that the Python
> > tarfile module introduces some either randomness or platform dependency.

Seems likely.

> > 6. meson dist is also slower because of the additional work.

Not easily solvable due to 4.

> > 7. meson dist produces .sha256sum files but we have called them .sha256.
> >   (This is obviously trivial, but it is something that would need to be
> > dealt with somehow nonetheless.)
> >
> > Most or all of these issues are fixable, either upstream in Meson or by
> > adjusting our own requirements.  But for now this route would have some
> > significant disadvantages.
>
> Thanks Peter. I will bring these up with upstream!

I think the solution to point 4 is to not unpack/repack if there are no
dist scripts and/or subprojects to distribute. I can take a look at
this later. I think this would also solve points 1, 2, 5, and 6 because
at that point meson is just calling git-archive.

--
Tristan Partin
Neon (https://neon.tech)



Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 24.01.24 18:57, Tristan Partin wrote:
>> > 4. Meson uses git archive internally, but then unpacks and repacks 
>> the > archive, which loses the ability to use git get-tar-commit-id.
> 
> Because Meson allows projects to distribute arbitrary files via 
> meson.add_dist_script(), and can include subprojects via `meson dist 
> --include-subprojects`, this doesn't seem like an easily solvable problem.

git archive has the --add-file option, which can probably do the same 
thing.  Subprojects are another thing, but obviously are more rarely used.

> I think the solution to point 4 is to not unpack/repack if there are no 
> dist scripts and/or subprojects to distribute. I can take a look at this 
> later. I think this would also solve points 1, 2, 5, and 6 because at 
> that point meson is just calling git-archive.

I think that would be a useful direction.




Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 22.01.24 21:04, Tristan Partin wrote:
>> +git = find_program('git', required: false, native: true, disabler: true)
>> +bzip2 = find_program('bzip2', required: false, native: true, 
>> disabler: true)
> 
> This doesn't need to be a disabler. git is fine as-is. See later 
> comment. Disablers only work like you are expecting when they are used 
> like how git is used. Once you call a method like .path(), all bets are 
> off.

ok, fixed

>> +distdir = meson.project_name() + '-' + meson.project_version()
>> +
>> +check_dirty_index = run_target('check-dirty-index',
>> +                               command: [git, 'diff-index', 
>> '--quiet', 'HEAD'])
> 
> Seems like you might want to add -C here too?

done

>> +tar_bz2 = custom_target('tar.bz2',
>> +  build_always_stale: true,
>> +  command: [git, '-C', '@SOURCE_ROOT@', '-c', 'tar.tar.bz2.command=' 
>> + bzip2.path() + ' -c', 'archive',
>> +            '--format', 'tar.bz2',
>> +            '--prefix', distdir + '/',
> 
> -            '-o', '@BUILD_ROOT@/@OUTPUT@',
> +            '-o', join_paths(meson.build_root(), '@OUTPUT@'),
> 
> This will generate the tarballs in the build directory. Do the same for 
> the previous target. Tested locally.

Fixed, thanks.  I had struggled with this one.

>> +            'HEAD', '.'],
>> +  install: false,
>> +  output: distdir + '.tar.bz2',
>> +)
> 
> The bz2 target should be wrapped in an `if bzip2.found()`.

Well, I think we want the dist step to fail if bzip2 is not there.  At 
least that is the current expectation.

>> +alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2])
> 
> Are you intending for the check_dirty_index target to prohibit the other 
> two targets from running? Currently that is not the case.

Yes, that was the hope, and that's how the make dist variant works.  But 
I couldn't figure this out with meson.  Also, the above actually also 
doesn't work with older meson versions, so I had to comment this out to 
get CI to work.

> If it is what 
> you intend, use a stamp file or something to indicate a relationship. 
> Alternatively, inline the git diff-index into the other commands. These 
> might also do better as external scripts. It would reduce duplication 
> between the autotools and Meson builds.

Yeah, maybe that's a direction.

The updated patch also supports vpath builds with make now.

I have also added a CI patch, for amusement.  Maybe we'll want to keep 
it, though.
Вложения

Re: make dist using git archive

От
"Tristan Partin"
Дата:
On Thu Jan 25, 2024 at 10:04 AM CST, Peter Eisentraut wrote:
> On 22.01.24 21:04, Tristan Partin wrote:
> >> +            'HEAD', '.'],
> >> +  install: false,
> >> +  output: distdir + '.tar.bz2',
> >> +)
> >
> > The bz2 target should be wrapped in an `if bzip2.found()`.

The way that this currently works is that you will fail at configure
time if bz2 doesn't exist on the system. Meson will try to resolve
a .path() method on a NotFoundProgram. You might want to define the bz2
target to just call `exit 1` in this case.

if bzip2.found()
  # do your current target
else
  bz2 = run_target('tar.bz2', command: ['exit', 1])
endif

This should cause pgdist to appropriately fail at run time when
generating the bz2 tarball.

> Well, I think we want the dist step to fail if bzip2 is not there.  At
> least that is the current expectation.
>
> >> +alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2])
> >
> > Are you intending for the check_dirty_index target to prohibit the other
> > two targets from running? Currently that is not the case.
>
> Yes, that was the hope, and that's how the make dist variant works.  But
> I couldn't figure this out with meson.  Also, the above actually also
> doesn't work with older meson versions, so I had to comment this out to
> get CI to work.
>
> > If it is what
> > you intend, use a stamp file or something to indicate a relationship.
> > Alternatively, inline the git diff-index into the other commands. These
> > might also do better as external scripts. It would reduce duplication
> > between the autotools and Meson builds.
>
> Yeah, maybe that's a direction.

For what it's worth, I run Meson 1.3, and the behavior of generating the
tarballs even though it is a dirty tree still occurred. In the new patch
you seem to say it was fixed in 0.60.

--
Tristan Partin
Neon (https://neon.tech)



Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 25.01.24 17:25, Tristan Partin wrote:
> The way that this currently works is that you will fail at configure 
> time if bz2 doesn't exist on the system. Meson will try to resolve a 
> .path() method on a NotFoundProgram. You might want to define the bz2 
> target to just call `exit 1` in this case.
> 
> if bzip2.found()
>   # do your current target
> else
>   bz2 = run_target('tar.bz2', command: ['exit', 1])
> endif
> 
> This should cause pgdist to appropriately fail at run time when 
> generating the bz2 tarball.

Ok, done that way.

> For what it's worth, I run Meson 1.3, and the behavior of generating the 
> tarballs even though it is a dirty tree still occurred. In the new patch 
> you seem to say it was fixed in 0.60.

The problem I'm referring to is that before 0.60, alias_target cannot 
depend on run_target (only "build target").  This is AFAICT not 
documented and might not have been an intentional change, but you can 
trace it in the meson source code, and it shows in the PostgreSQL CI. 
That's also why for the above bzip2 issue I have to use custom_target in 
place of your run_target.

Вложения

Re: make dist using git archive

От
"Tristan Partin"
Дата:
On Fri Jan 26, 2024 at 12:28 AM CST, Peter Eisentraut wrote:
> On 25.01.24 17:25, Tristan Partin wrote:
> > For what it's worth, I run Meson 1.3, and the behavior of generating the
> > tarballs even though it is a dirty tree still occurred. In the new patch
> > you seem to say it was fixed in 0.60.
>
> The problem I'm referring to is that before 0.60, alias_target cannot
> depend on run_target (only "build target").  This is AFAICT not
> documented and might not have been an intentional change, but you can
> trace it in the meson source code, and it shows in the PostgreSQL CI.
> That's also why for the above bzip2 issue I have to use custom_target in
> place of your run_target.

https://github.com/mesonbuild/meson/pull/12783

Thanks for finding these issues.

--
Tristan Partin
Neon (https://neon.tech)



Re: make dist using git archive

От
Eli Schwartz
Дата:
Hello, meson developer here.


On 1/23/24 4:30 AM, Peter Eisentraut wrote:
> On 22.01.24 21:04, Tristan Partin wrote:
>> I am not really following why we can't use the builtin Meson dist
>> command. The only difference from my testing is it doesn't use a
>> --prefix argument.
>
> Here are some problems I have identified:
>
> 1. meson dist internally runs gzip without the -n option.  That makes
> the tar.gz archive include a timestamp, which in turn makes it not
> reproducible.


Well, it uses python tarfile which uses python gzip support under the
hood, but yes, that is true, python tarfile doesn't expose this tunable.


> 2. Because gzip includes a platform indicator in the archive, the
> produced tar.gz archive is not reproducible across platforms.  (I don't
> know if gzip has an option to avoid that.  git archive uses an internal
> gzip implementation that handles this.)


This appears to be https://github.com/python/cpython/issues/112346


> 3. Meson does not support tar.bz2 archives.


Simple enough to add, but I'm a bit surprised as usually people seem to
want either gzip for portability or xz for efficient compression.


> 4. Meson uses git archive internally, but then unpacks and repacks the
> archive, which loses the ability to use git get-tar-commit-id.


What do you use this for? IMO a more robust way to track the commit used
is to use gitattributes export-subst to write a `.git_archival.txt` file
containing the commit sha1 and other info -- this can be read even after
the file is extracted, which means it can also be used to bake the ID
into the built binaries e.g. as part of --version output.


> 5. I have found that the tar archives created by meson and git archive
> include the files in different orders.  I suspect that the Python
> tarfile module introduces some either randomness or platform dependency.


Different orders is meaningless, the question is whether the order is
internally consistent. Python uses sorted() to guarantee a stable order,
which may be a different algorithm than the one git-archive uses to
guarantee a stable order. But the order should be stable and that is
what matters.


> 6. meson dist is also slower because of the additional work.


I'm amenable to skipping the extraction/recombination of subprojects and
running of dist scripts in the event that neither exist, as Tristan
offered to do, but...


> 7. meson dist produces .sha256sum files but we have called them .sha256.
>  (This is obviously trivial, but it is something that would need to be
> dealt with somehow nonetheless.)
>
> Most or all of these issues are fixable, either upstream in Meson or by
> adjusting our own requirements.  But for now this route would have some
> significant disadvantages.


Overall I feel like much of this is about requiring dist tarballs to be
byte-identical to other dist tarballs, although reproducible builds is
mainly about artifacts, not sources, and for sources it doesn't
generally matter unless the sources are ephemeral and generated
on-demand (in which case it is indeed very important to produce the same
tarball each time). A tarball is usually generated once, signed, and
uploaded to release hosting. Meson already guarantees the contents are
strictly based on the built tag.


--
Eli Schwartz

Вложения

Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 26.01.24 22:18, Eli Schwartz wrote:
> Hello, meson developer here.

Hello, and thanks for popping in!

>> 3. Meson does not support tar.bz2 archives.
> 
> Simple enough to add, but I'm a bit surprised as usually people seem to
> want either gzip for portability or xz for efficient compression.

We may very well end up updating our requirements here before too long, 
so I wouldn't bother with this on the meson side.  Last time we 
discussed this, there were still platforms under support that didn't 
have xz easily available.

>> 4. Meson uses git archive internally, but then unpacks and repacks the
>> archive, which loses the ability to use git get-tar-commit-id.
> 
> What do you use this for? IMO a more robust way to track the commit used
> is to use gitattributes export-subst to write a `.git_archival.txt` file
> containing the commit sha1 and other info -- this can be read even after
> the file is extracted, which means it can also be used to bake the ID
> into the built binaries e.g. as part of --version output.

It's a marginal use case, for sure.  But it is something that git 
provides tooling for that is universally available.  Any alternative 
would be an ad-hoc solution that is specific to our project and would be 
different for the next project.

>> 5. I have found that the tar archives created by meson and git archive
>> include the files in different orders.  I suspect that the Python
>> tarfile module introduces some either randomness or platform dependency.
> 
> Different orders is meaningless, the question is whether the order is
> internally consistent. Python uses sorted() to guarantee a stable order,
> which may be a different algorithm than the one git-archive uses to
> guarantee a stable order. But the order should be stable and that is
> what matters.

(FWIW, I couldn't reproduce this anymore, so maybe it's not actually an 
issue.)

> Overall I feel like much of this is about requiring dist tarballs to be
> byte-identical to other dist tarballs, although reproducible builds is
> mainly about artifacts, not sources, and for sources it doesn't
> generally matter unless the sources are ephemeral and generated
> on-demand (in which case it is indeed very important to produce the same
> tarball each time).

The source tarball is, in a way, also an artifact.

I think it's useful that others can easily independently verify that the 
produced tarball matches what they have locally.  It's not an absolute 
requirement, but given that it is possible, it seems useful to take 
advantage of it.

In a way, this also avoids the need for signing the tarball, which we 
don't do.  So maybe that contributes to a different perspective.




Re: make dist using git archive

От
Eli Schwartz
Дата:
On 1/31/24 3:03 AM, Peter Eisentraut wrote:
>> What do you use this for? IMO a more robust way to track the commit used
>> is to use gitattributes export-subst to write a `.git_archival.txt` file
>> containing the commit sha1 and other info -- this can be read even after
>> the file is extracted, which means it can also be used to bake the ID
>> into the built binaries e.g. as part of --version output.
>
> It's a marginal use case, for sure.  But it is something that git
> provides tooling for that is universally available.  Any alternative
> would be an ad-hoc solution that is specific to our project and would be
> different for the next project.


mercurial has the "archivemeta" config setting that exports similar
information, but forces the filename ".hg_archival.txt".

The setuptools-scm project follows this pattern by requiring the git
file to be called ".git_archival.txt" with a set pattern mimicking the
hg one:

https://setuptools-scm.readthedocs.io/en/latest/usage/#git-archives


So, I guess you could use this and then it would not be specific to your
project. :)


>> Overall I feel like much of this is about requiring dist tarballs to be
>> byte-identical to other dist tarballs, although reproducible builds is
>> mainly about artifacts, not sources, and for sources it doesn't
>> generally matter unless the sources are ephemeral and generated
>> on-demand (in which case it is indeed very important to produce the same
>> tarball each time).
>
> The source tarball is, in a way, also an artifact.
>
> I think it's useful that others can easily independently verify that the
> produced tarball matches what they have locally.  It's not an absolute
> requirement, but given that it is possible, it seems useful to take
> advantage of it.
>
> In a way, this also avoids the need for signing the tarball, which we
> don't do.  So maybe that contributes to a different perspective.


Since you mention signing and not as a simple "aside"...

That's a fascinating perspective. I wonder how people independently
verify that what they have locally (I assume from git clones) matches
what the postgres committers have authorized.

I'm a bit skeptical that you can avoid the need to perform code-signing
at some stage, somewhere, somehow, by suggesting that people can simply
git clone, run some commands and compare the tarball. The point of
signing is to verify that no one has acquired an untraceable API token
they should not have and gotten write access to the authoritative server
then uploaded malicious code under various forged identities, possibly
overwriting previous versions, either in git or out of git.

Ideally git commits should be signed, but that requires large numbers of
people to have security-minded git commit habits. From a quick check of
the postgres commit logs, only one person seems to be regularly signing
commits, which does provide a certain measure of protection -- an
attacker cannot attack via `git push --force` across that boundary, and
those commits serve as verifiable states that multiple people have seen.

The tags aren't signed either, which is a big issue for verifiably
identifying the release artifacts published by the release manager. Even
if not every commit is signed, having signed tags provides a known
coordination point of code that has been broadly tested and code-signed
for mass use.

...

In summary, my opinion is that using git-get-tar-commit-id provides zero
security guarantees, and if that's not something you are worried about
then that's one thing, but if you were expecting it to *replace* signing
the tarball, then that's.... very much another thing entirely, and not
one I can agree at all with.



--
Eli Schwartz

Вложения

Re: make dist using git archive

От
Robert Haas
Дата:
On Wed, Jan 31, 2024 at 10:50 AM Eli Schwartz <eschwartz93@gmail.com> wrote:
> Ideally git commits should be signed, but that requires large numbers of
> people to have security-minded git commit habits. From a quick check of
> the postgres commit logs, only one person seems to be regularly signing
> commits, which does provide a certain measure of protection -- an
> attacker cannot attack via `git push --force` across that boundary, and
> those commits serve as verifiable states that multiple people have seen.
>
> The tags aren't signed either, which is a big issue for verifiably
> identifying the release artifacts published by the release manager. Even
> if not every commit is signed, having signed tags provides a known
> coordination point of code that has been broadly tested and code-signed
> for mass use.
>
> In summary, my opinion is that using git-get-tar-commit-id provides zero
> security guarantees, and if that's not something you are worried about
> then that's one thing, but if you were expecting it to *replace* signing
> the tarball, then that's.... very much another thing entirely, and not
> one I can agree at all with.

I read this part with interest. I think there's definitely something
to be said for strengthening some of our practices in this area. At
the same time, I think it's reasonable for Peter to want to pursue the
limited goal he stated in the original post, namely reproducible
tarball generation, without getting tangled up in possible policy
changes that might be controversial and might require a bunch of
planning and coordination. "GPG signatures are good" can be true
without "reproducible tarball generation is good" being false; and if
"git archive" allows for that and "meson dist" doesn't, then we're
unlikely to adopt "meson dist".

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: make dist using git archive

От
Peter Eisentraut
Дата:
Small update: I noticed that on Windows (at least the one that is 
running the CI job), I need to use git -c core.autocrlf=false, otherwise 
git archive does line-ending conversion for the files it puts into the 
archive.  With this fix, all the archives produced by all the CI jobs 
across the different platforms match, except the .tar.gz archive from 
the Linux job, which I suspect suffers from an old git version.  We 
should get the Linux images updated to a newer Debian version soon 
anyway, so I think that issue will go away.

Вложения

Re: make dist using git archive

От
"Tristan Partin"
Дата:
On Sun Feb 11, 2024 at 5:09 PM CST, Peter Eisentraut wrote:
> Small update: I noticed that on Windows (at least the one that is
> running the CI job), I need to use git -c core.autocrlf=false, otherwise
> git archive does line-ending conversion for the files it puts into the
> archive.  With this fix, all the archives produced by all the CI jobs
> across the different platforms match, except the .tar.gz archive from
> the Linux job, which I suspect suffers from an old git version.  We
> should get the Linux images updated to a newer Debian version soon
> anyway, so I think that issue will go away.

I think with this change, it is unlikely I will be able to upstream
anything to Meson that would benefit Postgres here since setting this
option seems project dependent.

--
Tristan Partin
Neon (https://neon.tech)



Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 12.02.24 18:26, Tristan Partin wrote:
> On Sun Feb 11, 2024 at 5:09 PM CST, Peter Eisentraut wrote:
>> Small update: I noticed that on Windows (at least the one that is 
>> running the CI job), I need to use git -c core.autocrlf=false, 
>> otherwise git archive does line-ending conversion for the files it 
>> puts into the archive.  With this fix, all the archives produced by 
>> all the CI jobs across the different platforms match, except the 
>> .tar.gz archive from the Linux job, which I suspect suffers from an 
>> old git version.  We should get the Linux images updated to a newer 
>> Debian version soon anyway, so I think that issue will go away.
> 
> I think with this change, it is unlikely I will be able to upstream 
> anything to Meson that would benefit Postgres here since setting this 
> option seems project dependent.

Meson is vulnerable to the same problem: If the person who makes the 
release had some crlf-related git setting activated in their 
environment, then that would affect the tarball.  And such a tarball 
would be genuinely broken for non-Windows users, because at least some 
parts of Unix systems can't process such CRLF files correctly.

(This is easy to test: Run meson dist with core.autocrlf=true on the 
postgresql tree on a non-Windows system.  It will fail during dist check.)




Re: make dist using git archive

От
Peter Eisentraut
Дата:
Here is an updated version of this patch set.

I have removed the "dirty check" stuff.  It didn't really work well/was 
buggy under meson, and it failed mysteriously on the Linux CI tasks.  So 
let's just drop that functionality for now.

I have also added a more complete commit message and some more code 
comments.

I have extracted the freebsd CI script fix into a separate patch (0002). 
  I think this is useful even if we don't take the full CI patch (0003).

About the 0003 patch: It seems useful in principle to test these things 
continuously.  The dist script runs about 10 seconds in each task, and 
takes a bit of disk space for the artifacts.  I'm not sure to what 
degree this might bother someone.

Вложения

Re: make dist using git archive

От
"Tristan Partin"
Дата:
On Thu Mar 21, 2024 at 3:44 AM CDT, Peter Eisentraut wrote:
> Here is an updated version of this patch set.

You should add 'disabler: true' to the git find_program in Meson. If Git
doesn't exist on the system with the way your patch is currently
written, the targets would be defined, even though they would never
succeed.

You may also want to make sure that we are actually in a Git repository.
I don't think git-archive works outside one.

Re the autoclrf, is this something we could throw in a .gitattributes
files?

> I have removed the "dirty check" stuff.  It didn't really work well/was
> buggy under meson, and it failed mysteriously on the Linux CI tasks.  So
> let's just drop that functionality for now.
>
> I have also added a more complete commit message and some more code
> comments.

> Meson has its own distribution building command (meson dist), but we
> are not using that at this point.  The main problem is that the way
> they have implemented it, it is not deterministic in the above sense.
> Also, we want a "make" version for the time being.  But the target
> name "dist" in meson is reserved for that reason, so we call the
> custom target "pgdist" (so call something like "meson compile -C build
> pgdist").

I would suggest poisoning `meson dist` in the following way:

if not meson.is_subproject()
    # Maybe edit the comment...Maybe tell perl to print this message
    # instead and then exit non-zero?
    #
    # Meson has its own distribution building command (meson dist), but we
    # are not using that at this point.  The main problem is that the way
    # they have implemented it, it is not deterministic in the above sense.
    # Also, we want a "make" version for the time being.  But the target
    # name "dist" in meson is reserved for that reason, so we call the
    # custom target "pgdist" (so call something like "meson compile -C build
    # pgdist").
    #
    # We don't poison the dist if we are a subproject because it is
    # possible that the parent project may want to create a dist using
    # the builtin Meson method.
    meson.add_dist_script(perl, '-e', 'exit 1')
endif

> I have extracted the freebsd CI script fix into a separate patch (0002).
>   I think this is useful even if we don't take the full CI patch (0003).

0002 looks pretty reasonable to me.

> About the 0003 patch: It seems useful in principle to test these things
> continuously.  The dist script runs about 10 seconds in each task, and
> takes a bit of disk space for the artifacts.  I'm not sure to what
> degree this might bother someone.

0003 works for me :).

--
Tristan Partin
Neon (https://neon.tech)



Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 22.03.24 18:29, Tristan Partin wrote:
> On Thu Mar 21, 2024 at 3:44 AM CDT, Peter Eisentraut wrote:
>> Here is an updated version of this patch set.
> 
> You should add 'disabler: true' to the git find_program in Meson. If Git 
> doesn't exist on the system with the way your patch is currently 
> written, the targets would be defined, even though they would never 
> succeed.

Ok, added.  (I had it in there in an earlier version, but I think I 
misread one of your earlier messages and removed it.)

> You may also want to make sure that we are actually in a Git repository. 
> I don't think git-archive works outside one.

Then git archive will print an error.  That seems ok.

> Re the autoclrf, is this something we could throw in a .gitattributes 
> files?

We don't want to apply it to all git commands, just this one in this 
context.

> I would suggest poisoning `meson dist` in the following way:
> 
> if not meson.is_subproject()
[...]
>      meson.add_dist_script(perl, '-e', 'exit 1')
> endif

Good idea, added that.

>> I have extracted the freebsd CI script fix into a separate patch 
>> (0002).   I think this is useful even if we don't take the full CI 
>> patch (0003).
> 
> 0002 looks pretty reasonable to me.

Committed that one in the meantime.

Вложения

Re: make dist using git archive

От
"Tristan Partin"
Дата:
3 comments left that are inconsequential. Feel free to ignore.

> +# Meson has its own distribution building command (meson dist), but we
> +# are not using that at this point.  The main problem is that the way
> +# they have implemented it, it is not deterministic.  Also, we want it
> +# to be equivalent to the "make" version for the time being.  But the
> +# target name "dist" in meson is reserved for that reason, so we call
> +# the custom target "pgdist".

The second sentence is a run-on.

> +if bzip2.found()
> +  tar_bz2 = custom_target('tar.bz2',
> +    build_always_stale: true,
> +    command: [git, '-C', '@SOURCE_ROOT@',
> +              '-c', 'core.autocrlf=false',
> +              '-c', 'tar.tar.bz2.command="' + bzip2.path() + '" -c',
> +              'archive',
> +              '--format', 'tar.bz2',
> +              '--prefix', distdir + '/',
> +              '-o', join_paths(meson.build_root(), '@OUTPUT@'),
> +              'HEAD', '.'],
> +    install: false,
> +    output: distdir + '.tar.bz2',
> +  )

You might find Meson's string formatting syntax creates a more readable
command string:

'tar.tar.bz2.command=@0@ -c'.format(bzip2.path())

And then 'install: false' is the default if you feel like leaving it
out.

Otherwise, let's get this in!

--
Tristan Partin
Neon (https://neon.tech)



Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 24.03.24 16:42, Tristan Partin wrote:
> You might find Meson's string formatting syntax creates a more readable 
> command string:
> 
> 'tar.tar.bz2.command=@0@ -c'.format(bzip2.path())
> 
> And then 'install: false' is the default if you feel like leaving it out.
> 
> Otherwise, let's get this in!

Done and committed.




Re: make dist using git archive

От
Andres Freund
Дата:
Hi,

On 2024-03-25 06:44:33 +0100, Peter Eisentraut wrote:
> Done and committed.

This triggered a new warning for me:

../../../../../home/andres/src/postgresql/meson.build:3422: WARNING: Project targets '>=0.54' but uses feature
introducedin '0.55.0': Passing executable/found program object to script parameter of add_dist_script.
 

Greetings,

Andres



Re: make dist using git archive

От
Peter Eisentraut
Дата:
On 26.03.24 01:23, Andres Freund wrote:
> On 2024-03-25 06:44:33 +0100, Peter Eisentraut wrote:
>> Done and committed.
> 
> This triggered a new warning for me:
> 
> ../../../../../home/andres/src/postgresql/meson.build:3422: WARNING: Project targets '>=0.54' but uses feature
introducedin '0.55.0': Passing executable/found program object to script parameter of add_dist_script.
 

Hmm, I don't see that.  Is there another version dependency that 
controls when you see version dependency warnings? ;-)

We could trivially remove this particular line, or perhaps put a

if meson.version().version_compare('>=0.55')

around it.  (But would that still warn?)




Re: make dist using git archive

От
Andres Freund
Дата:
Hi,

On 2024-03-26 08:36:58 +0100, Peter Eisentraut wrote:
> On 26.03.24 01:23, Andres Freund wrote:
> > On 2024-03-25 06:44:33 +0100, Peter Eisentraut wrote:
> > > Done and committed.
> > 
> > This triggered a new warning for me:
> > 
> > ../../../../../home/andres/src/postgresql/meson.build:3422: WARNING: Project targets '>=0.54' but uses feature
introducedin '0.55.0': Passing executable/found program object to script parameter of add_dist_script.
 
> 
> Hmm, I don't see that.  Is there another version dependency that controls
> when you see version dependency warnings? ;-)

Sometimes an incompatibility is later noticed and a warning is introduced at
that point.

> We could trivially remove this particular line, or perhaps put a
> 
> if meson.version().version_compare('>=0.55')
> 
> around it.  (But would that still warn?)

It shouldn't, no. As long as the code is actually executed within the check,
it avoids the warning. However if you just set a variable inside the version
gated block and then later use the variable outside that, it will
warn. Probably hard to avoid...

Greetings,

Andres Freund



Re: make dist using git archive

От
"Tristan Partin"
Дата:
On Wed Jan 24, 2024 at 11:57 AM CST, Tristan Partin wrote:
> On Wed Jan 24, 2024 at 10:18 AM CST, Tristan Partin wrote:
> > On Tue Jan 23, 2024 at 3:30 AM CST, Peter Eisentraut wrote:
> > > On 22.01.24 21:04, Tristan Partin wrote:
> > > 3. Meson does not support tar.bz2 archives.
>
> Submitted https://github.com/mesonbuild/meson/pull/12770.

This has now been merged. It will be in 1.5, so we will probably see it
in RHEL in a decade :P.

> > > 4. Meson uses git archive internally, but then unpacks and repacks the
> > > archive, which loses the ability to use git get-tar-commit-id.
>
> Because Meson allows projects to distribute arbitrary files via
> meson.add_dist_script(), and can include subprojects via `meson dist
> --include-subprojects`, this doesn't seem like an easily solvable
> problem.
>
> > Thanks Peter. I will bring these up with upstream!
>
> I think the solution to point 4 is to not unpack/repack if there are no
> dist scripts and/or subprojects to distribute. I can take a look at
> this later. I think this would also solve points 1, 2, 5, and 6 because
> at that point meson is just calling git-archive.

I think implementing a solution to point 4 is a little bit more pressing
given that reproducible tarballs are more important after the xz
debaucle. I will try to give it some effort soon.

--
Tristan Partin
Neon (https://neon.tech)