Обсуждение: Question for coverage report
Dear hackers,
While investigating the code coverage I found the questionable report. Do you have
any theories around here?
Attached file is the code coverage report in pgoutput.c, generated by gcov. In
pgoutput_change(), I found that the branch seems not to be chosen but counted.
```
183433: 1492: TupleTableSlot *new_slot = NULL;
-: 1493:
183433: 1494: if (!is_publishable_relation(relation))
call 0 returned 100%
branch 1 taken 100% (fallthrough)
branch 2 taken 0%
1171: 1495: return;
-: 1496:
-: 1497: /*
-: 1498: * Remember the xid for the change in streaming mode. We need to send xid
-: 1499: * with each change in the streaming mode so that subscriber can make
-: 1500: * their association and on aborts, it can discard the corresponding
-: 1501: * changes.
-: 1502: */
183433: 1503: if (data->in_streaming)
```
IIUC, the line 1494 and 1503 were executed 183433 times. However, the 1495 was also
executed 1171 times. It was strange because the caller won't reach after the return,
count at 1503 should be 183433 - 1171.
Also, it reported that one of the branches was always chosen at 1494, so not sure
why both paths were counted.
I checked [1] and found the same issue; line 1494 and 1503 have the same count
but 1495 also has. Do you know the actual reason?
Locally used configure options:
./configure --enable-cassert --enable-debug --enable-tap-tests CFLAGS="-ggdb -Og -g0 -fno-omit-frame-pointer"
--enable-coverage--enable-injection-points
[1]: https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Вложения
> Attached file is the code coverage report in pgoutput.c, generated by gcov. In > pgoutput_change(), I found that the branch seems not to be chosen but counted. Sorry, I attached wrong file. Here is a report generated by gcov. Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
Hi, Hayato,
I noticed another weird thing in your attached log. The line 1506 and 1509 should be executed the same number of times. But line 1506 was executed one more time than line 1509. Maybe a bug of gcov?
183433: 1506: relentry = get_rel_sync_entry(data, relation);
call 0 returned 100%
-: 1507:
-: 1508: /* First check the table filter */
183432: 1509: switch (action)
Thanks,
Steven
From: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Sent: Friday, October 17, 2025 11:40
To: 'pgsql-hackers@lists.postgresql.org' <pgsql-hackers@lists.postgresql.org>
Subject: Question for coverage report
Sent: Friday, October 17, 2025 11:40
To: 'pgsql-hackers@lists.postgresql.org' <pgsql-hackers@lists.postgresql.org>
Subject: Question for coverage report
Dear hackers,
While investigating the code coverage I found the questionable report. Do you have
any theories around here?
Attached file is the code coverage report in pgoutput.c, generated by gcov. In
pgoutput_change(), I found that the branch seems not to be chosen but counted.
```
183433: 1492: TupleTableSlot *new_slot = NULL;
-: 1493:
183433: 1494: if (!is_publishable_relation(relation))
call 0 returned 100%
branch 1 taken 100% (fallthrough)
branch 2 taken 0%
1171: 1495: return;
-: 1496:
-: 1497: /*
-: 1498: * Remember the xid for the change in streaming mode. We need to send xid
-: 1499: * with each change in the streaming mode so that subscriber can make
-: 1500: * their association and on aborts, it can discard the corresponding
-: 1501: * changes.
-: 1502: */
183433: 1503: if (data->in_streaming)
```
IIUC, the line 1494 and 1503 were executed 183433 times. However, the 1495 was also
executed 1171 times. It was strange because the caller won't reach after the return,
count at 1503 should be 183433 - 1171.
Also, it reported that one of the branches was always chosen at 1494, so not sure
why both paths were counted.
I checked [1] and found the same issue; line 1494 and 1503 have the same count
but 1495 also has. Do you know the actual reason?
Locally used configure options:
./configure --enable-cassert --enable-debug --enable-tap-tests CFLAGS="-ggdb -Og -g0 -fno-omit-frame-pointer" --enable-coverage --enable-injection-points
[1]: https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html
Best regards,
Hayato Kuroda
FUJITSU LIMITED
While investigating the code coverage I found the questionable report. Do you have
any theories around here?
Attached file is the code coverage report in pgoutput.c, generated by gcov. In
pgoutput_change(), I found that the branch seems not to be chosen but counted.
```
183433: 1492: TupleTableSlot *new_slot = NULL;
-: 1493:
183433: 1494: if (!is_publishable_relation(relation))
call 0 returned 100%
branch 1 taken 100% (fallthrough)
branch 2 taken 0%
1171: 1495: return;
-: 1496:
-: 1497: /*
-: 1498: * Remember the xid for the change in streaming mode. We need to send xid
-: 1499: * with each change in the streaming mode so that subscriber can make
-: 1500: * their association and on aborts, it can discard the corresponding
-: 1501: * changes.
-: 1502: */
183433: 1503: if (data->in_streaming)
```
IIUC, the line 1494 and 1503 were executed 183433 times. However, the 1495 was also
executed 1171 times. It was strange because the caller won't reach after the return,
count at 1503 should be 183433 - 1171.
Also, it reported that one of the branches was always chosen at 1494, so not sure
why both paths were counted.
I checked [1] and found the same issue; line 1494 and 1503 have the same count
but 1495 also has. Do you know the actual reason?
Locally used configure options:
./configure --enable-cassert --enable-debug --enable-tap-tests CFLAGS="-ggdb -Og -g0 -fno-omit-frame-pointer" --enable-coverage --enable-injection-points
[1]: https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Steven, > I noticed another weird thing in your attached log. The line 1506 and 1509 should be executed > the same number of times. But line 1506 was executed one more time than line 1509. > Maybe a bug of gcov? Oh, I didn't notice. Yeah it was also strange. Is there a possibility that complier optimization did prefetch and it was also counted? Best regards, Hayato Kuroda FUJITSU LIMITED
On Fri, Oct 17, 2025 at 3:55 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > Oh, I didn't notice. Yeah it was also strange. Is there a possibility that complier optimization > did prefetch and it was also counted? FWIW, I'm used to having to set -O0 for the best coverage behavior. -Og _usually_ works fine, but it occasionally results in nonsensical reports for me. --Jacob
On 2025-Oct-17, Jacob Champion wrote: > On Fri, Oct 17, 2025 at 3:55 AM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > Oh, I didn't notice. Yeah it was also strange. Is there a > > possibility that complier optimization did prefetch and it was also > > counted? > > FWIW, I'm used to having to set -O0 for the best coverage behavior. > -Og _usually_ works fine, but it occasionally results in nonsensical > reports for me. Hmm, these reports are supposed to have been generated with -O0. This is the configure line: ./configure --cache-file=/home/coverage/pgsrc/configure.cache --enable-depend --enable-coverage --enable-tap-tests --enable-nls--with-python --with-perl --with-tcl --with-openssl --with-libxml --with-ldap --with-pam --with-llvm --with-lz4--enable-injection-points CFLAGS=-O0 'CPPFLAGS=-DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST'LLVM_CONFIG=/usr/bin/llvm-config-19 CLANG=clang-19 >> $LOG 2>&1 Is this somehow not effective? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La gente vulgar sólo piensa en pasar el tiempo; el que tiene talento, en aprovecharlo"
On Fri, Oct 17, 2025 at 2:15 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > Hmm, these reports are supposed to have been generated with -O0. This > is the configure line: > > ./configure --cache-file=/home/coverage/pgsrc/configure.cache --enable-depend --enable-coverage --enable-tap-tests --enable-nls--with-python --with-perl --with-tcl --with-openssl --with-libxml --with-ldap --with-pam --with-llvm --with-lz4--enable-injection-points CFLAGS=-O0 'CPPFLAGS=-DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST'LLVM_CONFIG=/usr/bin/llvm-config-19 CLANG=clang-19 >> $LOG 2>&1 > > Is this somehow not effective? Are you able to double-check the compiler invocation for pgoutput.c? --Jacob
On 2025-Oct-21, Jacob Champion wrote: > Are you able to double-check the compiler invocation for pgoutput.c? Yep -- it does get -O0. Is there some other flag this is missing? make -C backend/replication/pgoutput all make[2]: Entering directory '/home/coverage/pgsrc/pgsql/src/backend/replication/pgoutput' gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing-fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -fprofile-arcs -ftest-coverage-O0 -fPIC -fvisibility=hidden -I../../../../src/include -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES-DRAW_EXPRESSION_COVERAGE_TEST -D_GNU_SOURCE -I/usr/include/libxml2 -c -o pgoutput.o pgoutput.c-MMD -MP -MF .deps/pgoutput.Po gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing-fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -fprofile-arcs -ftest-coverage-O0 -fPIC -fvisibility=hidden -shared -o pgoutput.so pgoutput.o -L../../../../src/port -L../../../../src/common -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags -fvisibility=hidden make[2]: Leaving directory '/home/coverage/pgsrc/pgsql/src/backend/replication/pgoutput' -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I must say, I am absolutely impressed with what pgsql's implementation of VALUES allows me to do. It's kind of ridiculous how much "work" goes away in my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison) http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> On 2025-Oct-21, Jacob Champion wrote:
>> Are you able to double-check the compiler invocation for pgoutput.c?
> Yep -- it does get -O0. Is there some other flag this is missing?
I think this is probably a gcc artifact. I looked at the assembly
code generated in coverage mode, which on my setup is like
$ gcc -std=gnu11 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security
-fno-strict-aliasing-fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O0
-fprofile-arcs-ftest-coverage -fPIC -fvisibility=hidden -I../../../../src/include -D_GNU_SOURCE -S pgoutput.c
In pgoutput.s I find
.loc 9 1494 7
movq -120(%rbp), %rax
movq %rax, %rdi
call is_publishable_relation@PLT
movl %eax, %edx
# this is evidently incrementing the count for line 1494:
movq 8+__gcov0.pgoutput_change(%rip), %rax
addq $1, %rax
movq %rax, 8+__gcov0.pgoutput_change(%rip)
.loc 9 1494 6
movl %edx, %eax
xorl $1, %eax
.loc 9 1494 5
testb %al, %al
je .L275
# this is evidently incrementing the count for line 1495:
movq 16+__gcov0.pgoutput_change(%rip), %rax
addq $1, %rax
movq %rax, 16+__gcov0.pgoutput_change(%rip)
.loc 9 1495 3
jmp .L274
.L275:
.loc 9 1503 10
movq -40(%rbp), %rax
movzbl 24(%rax), %eax
.loc 9 1503 5
testb %al, %al
je .L277
# this is evidently incrementing the count for line 1504:
movq 24+__gcov0.pgoutput_change(%rip), %rax
addq $1, %rax
movq %rax, 24+__gcov0.pgoutput_change(%rip)
.loc 9 1504 15
movq -128(%rbp), %rax
movq 16(%rax), %rax
.loc 9 1504 7
movl 4(%rax), %eax
movl %eax, -4(%rbp)
.L277:
.loc 9 1506 13
movq -120(%rbp), %rdx
movq -40(%rbp), %rax
movq %rdx, %rsi
movq %rax, %rdi
call get_rel_sync_entry
movq %rax, -56(%rbp)
# this is evidently incrementing the count for line 1506:
movq 32+__gcov0.pgoutput_change(%rip), %rax
addq $1, %rax
movq %rax, 32+__gcov0.pgoutput_change(%rip)
So what I am seeing is that there is not actually a separate counter
for line 1503. It must be that gcov is told that's part of the same
basic block as line 1494 and should be displayed with the same count.
Another thing that I had not known before is that the count associated
with a function-call line is incremented after return, so that no
count will accrue if the function is called but throws an error.
Now, I'm sure Alvaro's machine is using a different gcc version, and
maybe it doesn't act exactly like this. But I think essentially what
is happening is that there is only one counter for each chunk of code
that the compiler regards as a basic block, and the block boundaries
are not as intuitive as you might think.
regards, tom lane
Dear Álvaro, Tom,
Thanks for giving some low-layer information. I understood like:
gcov does not actually count each line, counts a chunk of codes. Boundaries are
not same as code paths, before-and-after the if {} can be in the same chunk.
Per above, I could consider in pguotput.c., line 1495 was actually executed but
1503 was counted when it reached line 1494. Another question is why one of the
branch was reported as 100% and another one was 0%. Is it just because counts
was less than 1/100?
I googled some articles but could not find.
```
183433: 1494: if (!is_publishable_relation(relation))
call 0 returned 100%
branch 1 taken 100% (fallthrough)
branch 2 taken 0%
1171: 1495: return;
```
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Tue, Oct 21, 2025 at 11:11 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > Per above, I could consider in pguotput.c., line 1495 was actually executed but > 1503 was counted when it reached line 1494. Another question is why one of the > branch was reported as 100% and another one was 0%. Is it just because counts > was less than 1/100? (I don't know the answer to this question, but I will note that clang (15.0.7) does not seem to make this mistake on my machine, and reports a call count of zero for the `return` on line 1495. Looking at the disassembly, it seems to add more instrumentation points than what Tom showed for gcc.) --Jacob
Jacob Champion <jacob.champion@enterprisedb.com> writes:
> (I don't know the answer to this question, but I will note that clang
> (15.0.7) does not seem to make this mistake on my machine, and reports
> a call count of zero for the `return` on line 1495. Looking at the
> disassembly, it seems to add more instrumentation points than what Tom
> showed for gcc.)
Interesting. I also realized, after re-reading the snippet I showed,
that gcc is treating the code leading up to a CALL instruction as a
separate basic block from the code following the CALL. So that begs
the question of which count is shown for the function call's line
at the source-code level. It'd only differ when the function throws
an error, presumably.
regards, tom lane
On 2025-Oct-22, Tom Lane wrote: > Interesting. I also realized, after re-reading the snippet I showed, > that gcc is treating the code leading up to a CALL instruction as a > separate basic block from the code following the CALL. So that begs > the question of which count is shown for the function call's line > at the source-code level. It'd only differ when the function throws > an error, presumably. Maybe we should give up on GCC and use clang for the coverage report? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La gente vulgar sólo piensa en pasar el tiempo; el que tiene talento, en aprovecharlo"