Обсуждение: headerscheck ccache support

Поиск
Список
Период
Сортировка

headerscheck ccache support

От
Peter Eisentraut
Дата:
Currently, headerscheck and cpluspluscheck are very slow, and they 
defeat use of ccache.  I have fixed that, and now they are much faster. :-)

The problem was (I think) that the test files are created in a 
randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this 
directory is named on the compiler command line, which is part of the 
cache key.

My solution is to create the test files in the build directory.  For 
example, for src/include/storage/ipc.h I generate

headerscheck_src_include_storage_ipc_h.c (or .cpp)

Now ccache works.  (And it's also a bit easier to debug everything with 
this naming.)

The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck 
is from about 1min 20s to only 20s.  In local use, the speedups are similar.

(I noticed that on Cirrus CI, the first uncached run with this patch was 
quite a bit slower.  I don't know if this was because of the additional 
cache population that was happening, or if it was a fluke.  Maybe others 
can try it in their environments.  Maybe it's not a problem in the long 
run.)

Вложения

Re: headerscheck ccache support

От
Álvaro Herrera
Дата:
On 2025-Nov-21, Peter Eisentraut wrote:

> Currently, headerscheck and cpluspluscheck are very slow, and they defeat
> use of ccache.  I have fixed that, and now they are much faster. :-)

Yeah, I had noticed this too.  Thanks for fixing it.

> My solution is to create the test files in the build directory.  For
> example, for src/include/storage/ipc.h I generate
> 
> headerscheck_src_include_storage_ipc_h.c (or .cpp)
> 
> Now ccache works.

Sounds reasonable.  I notice that you're cleaning this file in a `rm`
line in the loop,

> @@ -253,10 +249,11 @@ do
>      if ! $COMPILER $COMPILER_FLAGS -I $builddir -I $srcdir \
>          -I $builddir/src/include -I $srcdir/src/include \
>          -I $builddir/src/interfaces/libpq -I $srcdir/src/interfaces/libpq \
> -        $EXTRAINCLUDES $EXTRAFLAGS -c $tmp/test.$ext -o $tmp/test.o
> +        $EXTRAINCLUDES $EXTRAFLAGS -c $test_file_name.$ext -o $test_file_name.o
>      then
>          exit_status=1
>      fi
> +    rm -f "$test_file_name.$ext" "$test_file_name.o"
>  done

but this means that if the script is interrupted halfway through, one
file or two files might remain in place.  Would it be possible to have
the current file name in a variable, so that the `trap` line can delete
them?

I've been also wondering about testing whether `parallel` is installed,
and use that if so.

>    # Verify headerscheck / cpluspluscheck succeed
>    #
> -  # - Don't use ccache, the files are uncacheable, polluting ccache's
> -  #   cache

So how bad is the effect of the cache pollution that's now going to
occur?  I imagine we don't really care.  (I have no idea how to measure
this, and probably it's a waste of time just to try.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"E pur si muove" (Galileo Galilei)



Re: headerscheck ccache support

От
Andres Freund
Дата:
Hi,

On 2025-11-21 11:48:10 +0100, Peter Eisentraut wrote:
> Currently, headerscheck and cpluspluscheck are very slow, and they defeat
> use of ccache.  I have fixed that, and now they are much faster. :-)
> 
> The problem was (I think) that the test files are created in a
> randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this directory
> is named on the compiler command line, which is part of the cache key.
> 
> My solution is to create the test files in the build directory.  For
> example, for src/include/storage/ipc.h I generate
> 
> headerscheck_src_include_storage_ipc_h.c (or .cpp)
> 
> Now ccache works.  (And it's also a bit easier to debug everything with this
> naming.)

I applaud this effort.  I think long-term I want this stuff properly
integrated into the build system, but this is a good step on its own.


> The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck is
> from about 1min 20s to only 20s.  In local use, the speedups are similar.
> 
> (I noticed that on Cirrus CI, the first uncached run with this patch was
> quite a bit slower.  I don't know if this was because of the additional
> cache population that was happening, or if it was a fluke.  Maybe others can
> try it in their environments.  Maybe it's not a problem in the long run.)

That doesn't surprise me - one of the major throttling factors on CI is the
number of iops. Filling an empty cache drastically increases iops, due to
having to create all the cache files (there are multiple for each cache
entry).

Hm. I wonder whether we would gain performance if we changed the linux CI
image to use data=writeback instead of the default data=ordered, that can
really help with metadata heavy workloads.

Bilal, any chance you would try modifying the CI image generation to change
the default option for the root filesystem using tune2fs -o
journal_data_writeback (that's easier IME than changing the mount option, as
the root filesystem is mounted very early, in the initramfs, before fstab is
known). Or perhaps just do it interactively in a VM (needs a reboot though)?

Greetings,

Andres Freund



Re: headerscheck ccache support

От
Andres Freund
Дата:
Hi,

On 2025-11-21 13:14:18 +0100, Álvaro Herrera wrote:
> On 2025-Nov-21, Peter Eisentraut wrote:
> >    # Verify headerscheck / cpluspluscheck succeed
> >    #
> > -  # - Don't use ccache, the files are uncacheable, polluting ccache's
> > -  #   cache
> 
> So how bad is the effect of the cache pollution that's now going to
> occur?  I imagine we don't really care.  (I have no idea how to measure
> this, and probably it's a waste of time just to try.)

I don't think there's any cache pollution after this change - the pollution
the comment was referencing was that ccache's cache would be filled with
entries for files that would *never* be used and would therefore be
useless. With this change the added use of ccache will just accelerate the CI
runs, which I wouldn't consider polluting...

Greetings,

Andres Freund



Re: headerscheck ccache support

От
Álvaro Herrera
Дата:
On 2025-Nov-21, Andres Freund wrote:

> On 2025-11-21 13:14:18 +0100, Álvaro Herrera wrote:

> > So how bad is the effect of the cache pollution that's now going to
> > occur?
> 
> I don't think there's any cache pollution after this change - the
> pollution the comment was referencing was that ccache's cache would be
> filled with entries for files that would *never* be used and would
> therefore be useless.

Ah, that makes sense.

> With this change the added use of ccache will just accelerate the CI
> runs, which I wouldn't consider polluting...

Sure.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."                   (New York Times, about Microsoft PowerPoint)



Re: headerscheck ccache support

От
Thomas Munro
Дата:
On Fri, Nov 21, 2025 at 11:48 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> Currently, headerscheck and cpluspluscheck are very slow, and they
> defeat use of ccache.  I have fixed that, and now they are much faster. :-)
>
> The problem was (I think) that the test files are created in a
> randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this
> directory is named on the compiler command line, which is part of the
> cache key.
>
> My solution is to create the test files in the build directory.  For
> example, for src/include/storage/ipc.h I generate
>
> headerscheck_src_include_storage_ipc_h.c (or .cpp)
>
> Now ccache works.  (And it's also a bit easier to debug everything with
> this naming.)
>
> The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck
> is from about 1min 20s to only 20s.  In local use, the speedups are similar.

+1

I wrote an almost identical patch[1] and then lost it down the back of
the sofa.  I was wondering about parallelising it next...

[1] https://www.postgresql.org/message-id/CA%2BhUKGJjQyZUvcu6udk5OKz5rnaF4a_hm5nb_VtZHYMH%2BvsN0g%40mail.gmail.com



Re: headerscheck ccache support

От
Peter Eisentraut
Дата:
On 22.11.25 09:54, Thomas Munro wrote:
> On Fri, Nov 21, 2025 at 11:48 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>> Currently, headerscheck and cpluspluscheck are very slow, and they
>> defeat use of ccache.  I have fixed that, and now they are much faster. :-)
>>
>> The problem was (I think) that the test files are created in a
>> randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this
>> directory is named on the compiler command line, which is part of the
>> cache key.
>>
>> My solution is to create the test files in the build directory.  For
>> example, for src/include/storage/ipc.h I generate
>>
>> headerscheck_src_include_storage_ipc_h.c (or .cpp)
>>
>> Now ccache works.  (And it's also a bit easier to debug everything with
>> this naming.)
>>
>> The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck
>> is from about 1min 20s to only 20s.  In local use, the speedups are similar.
> 
> +1
> 
> I wrote an almost identical patch[1] and then lost it down the back of
> the sofa.  I was wondering about parallelising it next...
> 
> [1] https://www.postgresql.org/message-id/CA%2BhUKGJjQyZUvcu6udk5OKz5rnaF4a_hm5nb_VtZHYMH%2BvsN0g%40mail.gmail.com

Ah yes, that's about the same idea.  The difference is that yours 
requires specifying TMPDIR on the make invocation.  So it wouldn't 
happen by default for local (non-CI) use.  I think I would like an 
implementation that also worked out of the box locally.




Re: headerscheck ccache support

От
Peter Eisentraut
Дата:
On 21.11.25 13:14, Álvaro Herrera wrote:
>> Now ccache works.
> 
> Sounds reasonable.  I notice that you're cleaning this file in a `rm`
> line in the loop,
> 
>> @@ -253,10 +249,11 @@ do
>>       if ! $COMPILER $COMPILER_FLAGS -I $builddir -I $srcdir \
>>           -I $builddir/src/include -I $srcdir/src/include \
>>           -I $builddir/src/interfaces/libpq -I $srcdir/src/interfaces/libpq \
>> -        $EXTRAINCLUDES $EXTRAFLAGS -c $tmp/test.$ext -o $tmp/test.o
>> +        $EXTRAINCLUDES $EXTRAFLAGS -c $test_file_name.$ext -o $test_file_name.o
>>       then
>>           exit_status=1
>>       fi
>> +    rm -f "$test_file_name.$ext" "$test_file_name.o"
>>   done
> 
> but this means that if the script is interrupted halfway through, one
> file or two files might remain in place.  Would it be possible to have
> the current file name in a variable, so that the `trap` line can delete
> them?

Here is another patch set.  I have made some tweaks to address the issue 
you raise, and I took some code and inspiration from Thomas Munro's 
patch.  The solution I chose is to create a temporary subdirectory in 
the build directory, and create the test files in there.  That way the 
trap can just blow away the directory, as before.

> I've been also wondering about testing whether `parallel` is installed,
> and use that if so.

Another approach I had in mind for some time is to just write out a 
makefile with the test compile commands, and run that with make -j. 
Demo patch attached.  (I'm not seriously proposing this.  For one thing, 
we probably wouldn't want to introduce a dependency on make.  But you 
could probably write an equivalent ninja.build file.)

But this doesn't seem to buy very much.  The overhead of the shell 
script to write out the test files appears to become significant 
compared the the actual compile commands.

Another simple idea is to run headerscheck and cpluspluscheck in 
parallel.  You can already do that manually, and we could do that on CI 
to save about 50% wall-clock time.  Patch attached.

Вложения

Re: headerscheck ccache support

От
Nazir Bilal Yavuz
Дата:
Hi

On Fri, 28 Nov 2025 at 14:39, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 21.11.25 13:14, Álvaro Herrera wrote:
> >> Now ccache works.
> >
> > Sounds reasonable.  I notice that you're cleaning this file in a `rm`
> > line in the loop,
> >
> >> @@ -253,10 +249,11 @@ do
> >>      if ! $COMPILER $COMPILER_FLAGS -I $builddir -I $srcdir \
> >>              -I $builddir/src/include -I $srcdir/src/include \
> >>              -I $builddir/src/interfaces/libpq -I $srcdir/src/interfaces/libpq \
> >> -            $EXTRAINCLUDES $EXTRAFLAGS -c $tmp/test.$ext -o $tmp/test.o
> >> +            $EXTRAINCLUDES $EXTRAFLAGS -c $test_file_name.$ext -o $test_file_name.o
> >>      then
> >>              exit_status=1
> >>      fi
> >> +    rm -f "$test_file_name.$ext" "$test_file_name.o"
> >>   done
> >
> > but this means that if the script is interrupted halfway through, one
> > file or two files might remain in place.  Would it be possible to have
> > the current file name in a variable, so that the `trap` line can delete
> > them?
>
> Here is another patch set.

I could not apply patches cleanly. Am I missing something?

$ git am ~/Downloads/v2-0001-headerscheck-ccache-support.patch
Applying: headerscheck ccache support
error: patch failed: src/tools/pginclude/headerscheck:73
error: src/tools/pginclude/headerscheck: patch does not apply
Patch failed at 0001 headerscheck ccache support

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: headerscheck ccache support

От
Álvaro Herrera
Дата:
On 2025-Nov-28, Nazir Bilal Yavuz wrote:

> I could not apply patches cleanly. Am I missing something?

Yeah, I couldn't get `git am` or `git apply` to accept the patches
either, not even with -3.  However, `patch -p1` does accept it.  Weird.

I have git 2.47.3 and the patch says 2.52.0.  Maybe somebody submitted
vibe-coded features into recent git, causing it to generate subtly
broken patch files?

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Para tener más hay que desear menos"



Re: headerscheck ccache support

От
Peter Eisentraut
Дата:
On 28.11.25 13:59, Álvaro Herrera wrote:
> On 2025-Nov-28, Nazir Bilal Yavuz wrote:
> 
>> I could not apply patches cleanly. Am I missing something?
> 
> Yeah, I couldn't get `git am` or `git apply` to accept the patches
> either, not even with -3.  However, `patch -p1` does accept it.  Weird.

I had another commit in my local branch that I needed to get the actual 
compilations working, but which was unrelated to this discussion. 
Apparently, this created enough fuzz to break the subsequent patches. 
Here is the patch series again completely, but ignore the 0000 patch for 
now.

Вложения

Re: headerscheck ccache support

От
Álvaro Herrera
Дата:
On 2025-Nov-28, Peter Eisentraut wrote:

> Here is another patch set.  I have made some tweaks to address the issue you
> raise, and I took some code and inspiration from Thomas Munro's patch.  The
> solution I chose is to create a temporary subdirectory in the build
> directory, and create the test files in there.  That way the trap can just
> blow away the directory, as before.

I tried with all patches applied, and it seems to work okay -- the
header compiles are all cached after the first pass, according to
ccache --show-stats.

> Another approach I had in mind for some time is to just write out a makefile
> with the test compile commands, and run that with make -j. Demo patch
> attached.  (I'm not seriously proposing this.  For one thing, we probably
> wouldn't want to introduce a dependency on make.  But you could probably
> write an equivalent ninja.build file.)
> 
> But this doesn't seem to buy very much.  The overhead of the shell script to
> write out the test files appears to become significant compared the the
> actual compile commands.

Really?  I tried editing the make line to have -j8 (your patch doesn't
have a -j switch at all there) and it runs in 4s instead of 12s on my
laptop, which I would call a significant win.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)



Re: headerscheck ccache support

От
Álvaro Herrera
Дата:
On 2025-Nov-28, Peter Eisentraut wrote:

> But this doesn't seem to buy very much.  The overhead of the shell script to
> write out the test files appears to become significant compared the the
> actual compile commands.

If you wanted to save some shell execution time, you could move the `tr`
calls to the bottom of the loop to avoid doing it for files that the
`if` block is going to discard.  But is that significant?  I doubt it.

(I didn't quite understand why you use printf instead of echo, given
that both are shell builtins in any case.)

I think parallelism is also going to win in the case where the compiles
are not cached.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: headerscheck ccache support

От
Peter Eisentraut
Дата:
On 28.11.25 14:10, Peter Eisentraut wrote:
> On 28.11.25 13:59, Álvaro Herrera wrote:
>> On 2025-Nov-28, Nazir Bilal Yavuz wrote:
>>
>>> I could not apply patches cleanly. Am I missing something?
>>
>> Yeah, I couldn't get `git am` or `git apply` to accept the patches
>> either, not even with -3.  However, `patch -p1` does accept it.  Weird.
> 
> I had another commit in my local branch that I needed to get the actual 
> compilations working, but which was unrelated to this discussion. 
> Apparently, this created enough fuzz to break the subsequent patches. 
> Here is the patch series again completely, but ignore the 0000 patch for 
> now.

I have committed patches 0000 and 0001, which was the main subject of 
this thread.

What do people think about patch 0002, which runs headerscheck and 
cpluspluscheck in parallel on ci?  It should save several seconds of 
wall-clock time for that task, and I don't see any drawbacks, unless you 
want to retain the specific previous output format for some reason.

I don't plan to pursue the other parallelization work (patch 0003) at 
this time.



Re: headerscheck ccache support

От
Peter Eisentraut
Дата:
On 28.11.25 14:29, Álvaro Herrera wrote:
>> Another approach I had in mind for some time is to just write out a makefile
>> with the test compile commands, and run that with make -j. Demo patch
>> attached.  (I'm not seriously proposing this.  For one thing, we probably
>> wouldn't want to introduce a dependency on make.  But you could probably
>> write an equivalent ninja.build file.)
>>
>> But this doesn't seem to buy very much.  The overhead of the shell script to
>> write out the test files appears to become significant compared the the
>> actual compile commands.
> 
> Really?  I tried editing the make line to have -j8 (your patch doesn't
> have a -j switch at all there)

Note that the "+" I added to the targets in the top-level GNUmakefile.in 
causes the make flags to passed down, so you can run make -jN 
headerscheck etc. without having to edit or hardcode the make command 
inside the script.




Re: headerscheck ccache support

От
Peter Eisentraut
Дата:
On 28.11.25 15:16, Álvaro Herrera wrote:
> On 2025-Nov-28, Peter Eisentraut wrote:
> 
>> But this doesn't seem to buy very much.  The overhead of the shell script to
>> write out the test files appears to become significant compared the the
>> actual compile commands.
> 
> If you wanted to save some shell execution time, you could move the `tr`
> calls to the bottom of the loop to avoid doing it for files that the
> `if` block is going to discard.  But is that significant?  I doubt it.

This actually made a measurable difference, so I included that change in 
the committed patch.

> (I didn't quite understand why you use printf instead of echo, given
> that both are shell builtins in any case.)

printf is nowadays preferable over echo in portable shell scripts.  See 
for example

https://cgit.git.savannah.gnu.org/cgit/autoconf.git/tree/NEWS#n435

and

https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.72/html_node/Limitations-of-Builtins.html



Re: headerscheck ccache support

От
Andres Freund
Дата:
Hi,

On 2025-12-04 11:52:07 +0100, Peter Eisentraut wrote:
> What do people think about patch 0002, which runs headerscheck and
> cpluspluscheck in parallel on ci?  It should save several seconds of
> wall-clock time for that task, and I don't see any drawbacks, unless you
> want to retain the specific previous output format for some reason.

I think the output today is easier to parse, it's more obvious whether the
error is from a cpluspluscheck violation or a headercheck violation. However,
the runtime win seems to more than outweigh that.


> From 0a580cb2e58dcc257978d5cc20528f2e4a315880 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter@eisentraut.org>
> Date: Fri, 28 Nov 2025 12:21:31 +0100
> Subject: [PATCH v2.1 2/3] ci: Run headerscheck and cplusplucheck in parallel
> 
> ---
>  .cirrus.tasks.yml | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
> index 038d043d00e..69224fcfec7 100644
> --- a/.cirrus.tasks.yml
> +++ b/.cirrus.tasks.yml
> @@ -1015,9 +1015,7 @@ task:
>          --quiet \
>          CC="ccache gcc" CXX="ccache g++" CLANG="ccache clang"
>        make -s -j${BUILD_JOBS} clean
> -      time make -s headerscheck EXTRAFLAGS='-fmax-errors=10'
> -    headers_cpluspluscheck_script: |
> -      time make -s cpluspluscheck EXTRAFLAGS='-fmax-errors=10'
> +      time make -s -j${BUILD_JOBS} -k -Otarget headerscheck cpluspluscheck EXTRAFLAGS='-fmax-errors=10'

Doesn't really matter, but I'd probably use ${CHECKFLAGS} instead of -Otarget
directly.

I'd add a comment saying that we run both in the same script to increase
parallelism and that we use -k to get the result of both. But again, this is
just a very minor nitpick, and if you prefer not to, I'm fine.


Greetings,

Andres Freund