Обсуждение: Question for coverage report

Поиск
Список
Период
Сортировка

Question for coverage report

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear hackers,

While investigating the code coverage I found the questionable report. Do you have
any theories around here?

Attached file is the code coverage report in pgoutput.c, generated by gcov. In
pgoutput_change(), I found that the branch seems not to be chosen but counted.

```
   183433: 1492:        TupleTableSlot *new_slot = NULL;
        -: 1493:
   183433: 1494:        if (!is_publishable_relation(relation))
call    0 returned 100%
branch  1 taken 100% (fallthrough)
branch  2 taken 0%
     1171: 1495:                return;
        -: 1496:
        -: 1497:        /*
        -: 1498:         * Remember the xid for the change in streaming mode. We need to send xid
        -: 1499:         * with each change in the streaming mode so that subscriber can make
        -: 1500:         * their association and on aborts, it can discard the corresponding
        -: 1501:         * changes.
        -: 1502:         */
   183433: 1503:        if (data->in_streaming)
```

IIUC, the line 1494 and 1503 were executed 183433 times. However, the 1495 was also
executed 1171 times. It was strange because the caller won't reach after the return,
count at 1503 should be 183433 - 1171.
Also, it reported that one of the branches was always chosen at 1494, so not sure
why both paths were counted.

I checked [1] and found the same issue; line 1494 and 1503 have the same count
but 1495 also has. Do you know the actual reason?

Locally used configure options:
./configure --enable-cassert --enable-debug --enable-tap-tests CFLAGS="-ggdb -Og -g0 -fno-omit-frame-pointer"
--enable-coverage--enable-injection-points 

[1]: https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

RE: Question for coverage report

От
"Hayato Kuroda (Fujitsu)"
Дата:
> Attached file is the code coverage report in pgoutput.c, generated by gcov. In
> pgoutput_change(), I found that the branch seems not to be chosen but counted.

Sorry, I attached wrong file. Here is a report generated by gcov.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



Вложения

Re: Question for coverage report

От
Steven Niu
Дата:
Hi,  Hayato, 

I noticed another weird thing in your attached log.  The line 1506 and 1509 should be executed the same number of times.  But line 1506 was executed one more time than line 1509. Maybe a bug of gcov? 

   183433: 1506:  relentry = get_rel_sync_entry(data, relation);
call    0 returned 100%
        -: 1507:
        -: 1508:  /* First check the table filter */
   183432: 1509:  switch (action)

Thanks,
Steven

From: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Sent: Friday, October 17, 2025 11:40
To: 'pgsql-hackers@lists.postgresql.org' <pgsql-hackers@lists.postgresql.org>
Subject: Question for coverage report

Dear hackers,

While investigating the code coverage I found the questionable report. Do you have
any theories around here?

Attached file is the code coverage report in pgoutput.c, generated by gcov. In
pgoutput_change(), I found that the branch seems not to be chosen but counted.

```
   183433: 1492:        TupleTableSlot *new_slot = NULL;
        -: 1493:
   183433: 1494:        if (!is_publishable_relation(relation))
call    0 returned 100%
branch  1 taken 100% (fallthrough)
branch  2 taken 0%
     1171: 1495:                return;
        -: 1496:
        -: 1497:        /*
        -: 1498:         * Remember the xid for the change in streaming mode. We need to send xid
        -: 1499:         * with each change in the streaming mode so that subscriber can make
        -: 1500:         * their association and on aborts, it can discard the corresponding
        -: 1501:         * changes.
        -: 1502:         */
   183433: 1503:        if (data->in_streaming)
```

IIUC, the line 1494 and 1503 were executed 183433 times. However, the 1495 was also
executed 1171 times. It was strange because the caller won't reach after the return,
count at 1503 should be 183433 - 1171.
Also, it reported that one of the branches was always chosen at 1494, so not sure
why both paths were counted.

I checked [1] and found the same issue; line 1494 and 1503 have the same count
but 1495 also has. Do you know the actual reason?

Locally used configure options:
./configure --enable-cassert --enable-debug --enable-tap-tests CFLAGS="-ggdb -Og -g0 -fno-omit-frame-pointer" --enable-coverage --enable-injection-points

[1]: https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html

Best regards,
Hayato Kuroda
FUJITSU LIMITED

RE: Question for coverage report

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Steven,

> I noticed another weird thing in your attached log.  The line 1506 and 1509 should be executed
> the same number of times.  But line 1506 was executed one more time than line 1509.
> Maybe a bug of gcov?

Oh, I didn't notice. Yeah it was also strange. Is there a possibility that complier optimization
did prefetch and it was also counted?

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Question for coverage report

От
Jacob Champion
Дата:
On Fri, Oct 17, 2025 at 3:55 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
> Oh, I didn't notice. Yeah it was also strange. Is there a possibility that complier optimization
> did prefetch and it was also counted?

FWIW, I'm used to having to set -O0 for the best coverage behavior.
-Og _usually_ works fine, but it occasionally results in nonsensical
reports for me.

--Jacob



Re: Question for coverage report

От
Álvaro Herrera
Дата:
On 2025-Oct-17, Jacob Champion wrote:

> On Fri, Oct 17, 2025 at 3:55 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> > Oh, I didn't notice. Yeah it was also strange. Is there a
> > possibility that complier optimization did prefetch and it was also
> > counted?
> 
> FWIW, I'm used to having to set -O0 for the best coverage behavior.
> -Og _usually_ works fine, but it occasionally results in nonsensical
> reports for me.

Hmm, these reports are supposed to have been generated with -O0.  This
is the configure line:

./configure --cache-file=/home/coverage/pgsrc/configure.cache --enable-depend --enable-coverage --enable-tap-tests
--enable-nls--with-python --with-perl --with-tcl --with-openssl --with-libxml --with-ldap --with-pam --with-llvm
--with-lz4--enable-injection-points CFLAGS=-O0 'CPPFLAGS=-DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST'LLVM_CONFIG=/usr/bin/llvm-config-19 CLANG=clang-19 >> $LOG 2>&1
 

Is this somehow not effective?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La gente vulgar sólo piensa en pasar el tiempo;
el que tiene talento, en aprovecharlo"



Re: Question for coverage report

От
Jacob Champion
Дата:
On Fri, Oct 17, 2025 at 2:15 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> Hmm, these reports are supposed to have been generated with -O0.  This
> is the configure line:
>
> ./configure --cache-file=/home/coverage/pgsrc/configure.cache --enable-depend --enable-coverage --enable-tap-tests
--enable-nls--with-python --with-perl --with-tcl --with-openssl --with-libxml --with-ldap --with-pam --with-llvm
--with-lz4--enable-injection-points CFLAGS=-O0 'CPPFLAGS=-DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST'LLVM_CONFIG=/usr/bin/llvm-config-19 CLANG=clang-19 >> $LOG 2>&1 
>
> Is this somehow not effective?

Are you able to double-check the compiler invocation for pgoutput.c?

--Jacob



Re: Question for coverage report

От
Álvaro Herrera
Дата:
On 2025-Oct-21, Jacob Champion wrote:

> Are you able to double-check the compiler invocation for pgoutput.c?

Yep -- it does get -O0.  Is there some other flag this is missing?

make -C backend/replication/pgoutput all
make[2]: Entering directory '/home/coverage/pgsrc/pgsql/src/backend/replication/pgoutput'
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security
-fno-strict-aliasing-fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -fprofile-arcs
-ftest-coverage-O0 -fPIC -fvisibility=hidden -I../../../../src/include -DCOPY_PARSE_PLAN_TREES
-DWRITE_READ_PARSE_PLAN_TREES-DRAW_EXPRESSION_COVERAGE_TEST -D_GNU_SOURCE  -I/usr/include/libxml2     -c -o pgoutput.o
pgoutput.c-MMD -MP -MF .deps/pgoutput.Po
 
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security
-fno-strict-aliasing-fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -fprofile-arcs
-ftest-coverage-O0 -fPIC -fvisibility=hidden -shared -o pgoutput.so  pgoutput.o -L../../../../src/port
-L../../../../src/common   -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags -fvisibility=hidden 
 
make[2]: Leaving directory '/home/coverage/pgsrc/pgsql/src/backend/replication/pgoutput'

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code.  Too bad I can't do this at work (Oracle 8/9)."       (Tom Allison)
           http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php



Re: Question for coverage report

От
Tom Lane
Дата:
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> On 2025-Oct-21, Jacob Champion wrote:
>> Are you able to double-check the compiler invocation for pgoutput.c?

> Yep -- it does get -O0.  Is there some other flag this is missing?

I think this is probably a gcc artifact.  I looked at the assembly
code generated in coverage mode, which on my setup is like

$ gcc -std=gnu11 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security
-fno-strict-aliasing-fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O0
-fprofile-arcs-ftest-coverage -fPIC -fvisibility=hidden -I../../../../src/include -D_GNU_SOURCE -S pgoutput.c 

In pgoutput.s I find

    .loc 9 1494 7
    movq    -120(%rbp), %rax
    movq    %rax, %rdi
    call    is_publishable_relation@PLT
    movl    %eax, %edx
# this is evidently incrementing the count for line 1494:
    movq    8+__gcov0.pgoutput_change(%rip), %rax
    addq    $1, %rax
    movq    %rax, 8+__gcov0.pgoutput_change(%rip)
    .loc 9 1494 6
    movl    %edx, %eax
    xorl    $1, %eax
    .loc 9 1494 5
    testb    %al, %al
    je    .L275
# this is evidently incrementing the count for line 1495:
    movq    16+__gcov0.pgoutput_change(%rip), %rax
    addq    $1, %rax
    movq    %rax, 16+__gcov0.pgoutput_change(%rip)
    .loc 9 1495 3
    jmp    .L274
.L275:
    .loc 9 1503 10
    movq    -40(%rbp), %rax
    movzbl    24(%rax), %eax
    .loc 9 1503 5
    testb    %al, %al
    je    .L277
# this is evidently incrementing the count for line 1504:
    movq    24+__gcov0.pgoutput_change(%rip), %rax
    addq    $1, %rax
    movq    %rax, 24+__gcov0.pgoutput_change(%rip)
    .loc 9 1504 15
    movq    -128(%rbp), %rax
    movq    16(%rax), %rax
    .loc 9 1504 7
    movl    4(%rax), %eax
    movl    %eax, -4(%rbp)
.L277:
    .loc 9 1506 13
    movq    -120(%rbp), %rdx
    movq    -40(%rbp), %rax
    movq    %rdx, %rsi
    movq    %rax, %rdi
    call    get_rel_sync_entry
    movq    %rax, -56(%rbp)
# this is evidently incrementing the count for line 1506:
    movq    32+__gcov0.pgoutput_change(%rip), %rax
    addq    $1, %rax
    movq    %rax, 32+__gcov0.pgoutput_change(%rip)

So what I am seeing is that there is not actually a separate counter
for line 1503.  It must be that gcov is told that's part of the same
basic block as line 1494 and should be displayed with the same count.

Another thing that I had not known before is that the count associated
with a function-call line is incremented after return, so that no
count will accrue if the function is called but throws an error.

Now, I'm sure Alvaro's machine is using a different gcc version, and
maybe it doesn't act exactly like this.  But I think essentially what
is happening is that there is only one counter for each chunk of code
that the compiler regards as a basic block, and the block boundaries
are not as intuitive as you might think.

            regards, tom lane



RE: Question for coverage report

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Álvaro, Tom,

Thanks for giving some low-layer information. I understood like:
gcov does not actually count each line, counts a chunk of codes. Boundaries are
not same as code paths, before-and-after the if {} can be in the same chunk.

Per above, I could consider in pguotput.c., line 1495 was actually executed but
1503 was counted when it reached line 1494. Another question is why one of the
branch was reported as 100% and another one was 0%. Is it just because counts
was less than 1/100?
I googled some articles but could not find.

```
   183433: 1494:        if (!is_publishable_relation(relation))
call    0 returned 100%
branch  1 taken 100% (fallthrough)
branch  2 taken 0%
     1171: 1495:                return;
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Question for coverage report

От
Jacob Champion
Дата:
On Tue, Oct 21, 2025 at 11:11 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
> Per above, I could consider in pguotput.c., line 1495 was actually executed but
> 1503 was counted when it reached line 1494. Another question is why one of the
> branch was reported as 100% and another one was 0%. Is it just because counts
> was less than 1/100?

(I don't know the answer to this question, but I will note that clang
(15.0.7) does not seem to make this mistake on my machine, and reports
a call count of zero for the `return` on line 1495. Looking at the
disassembly, it seems to add more instrumentation points than what Tom
showed for gcc.)

--Jacob



Re: Question for coverage report

От
Tom Lane
Дата:
Jacob Champion <jacob.champion@enterprisedb.com> writes:
> (I don't know the answer to this question, but I will note that clang
> (15.0.7) does not seem to make this mistake on my machine, and reports
> a call count of zero for the `return` on line 1495. Looking at the
> disassembly, it seems to add more instrumentation points than what Tom
> showed for gcc.)

Interesting.  I also realized, after re-reading the snippet I showed,
that gcc is treating the code leading up to a CALL instruction as a
separate basic block from the code following the CALL.  So that begs
the question of which count is shown for the function call's line
at the source-code level.  It'd only differ when the function throws
an error, presumably.

            regards, tom lane



Re: Question for coverage report

От
Álvaro Herrera
Дата:
On 2025-Oct-22, Tom Lane wrote:

> Interesting.  I also realized, after re-reading the snippet I showed,
> that gcc is treating the code leading up to a CALL instruction as a
> separate basic block from the code following the CALL.  So that begs
> the question of which count is shown for the function call's line
> at the source-code level.  It'd only differ when the function throws
> an error, presumably.

Maybe we should give up on GCC and use clang for the coverage report?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La gente vulgar sólo piensa en pasar el tiempo;
el que tiene talento, en aprovecharlo"