Обсуждение: Fixing MSVC's inability to detect elog(ERROR) does not return
(Moving this discussion to hackers. Previously in [0].) Background: The ereport_domain() macro has a pg_unreachable() which is meant to be hit when the elevel >= ERROR so that the compiler can determine that elog/ereport ERROR never returns. This works fine in gcc and clang but MSVC seems to be unable to figure this out. This causes us to have to maintain hundreds of lines like "return NULL; /* keep compiler quiet */" to avoid warnings. What's causing this?: You might imagine that this happens because pg_unreachable() isn't working for MSVC, but you'd be wrong. It's because of the double-evaluation mitigation of elevel in the ereport_domain() macro. We have: const int elevel_ = (elevel); and because later we do: if (elevel_ >= ERROR) \ pg_unreachable(); \ the MSVC compiler can't figure out that the pg_unreachable() is always hit because it sees elevel_ as variable rather than constant, even when the macro's elevel parameter is a compile-time constant. Fixing it: Ideally, MSVC would have something like __builtin_const_p(). Looking around, I found [1], which hacks together a macro which does the same job. The problem is, the macro uses _Generic(), which is C11. Tom suggested we adjust the meson build scripts to make MSVC always build with C11, which seems doable due to 8fd9bb1d9 making our minimum Visual Studio version 2019, and going by [2], vs2019 supports C11. So, I think that means we can adjust the meson build scripts to pass /std:c11 when building in MSVC. The attached patch does this and defines a pg_builtin_constant() macro and adjusts ereport_domain() to use it. One thing I've not done yet is adjust all other usages of __builtin_const_p() to use pg_builtin_const(). I did a quick check of the postgres.exe size with and without this change. It does save about 13KB, but that seems to be entirely from the /std:c11 flag. None is due to the compiler emitting less code after elog(ERROR) / ereport(ERROR) :-( postgres.exe size: master: 10_143_232 bytes patched: 10_129_920 bytes Does anyone have any objections or comments to any of the above? David [0] https://postgr.es/m/CAApHDvrFdXjbrV6KCx_GHKYSufUbNDYSsjppcJQiGOURfJE6qg@mail.gmail.com [1] https://stackoverflow.com/questions/49480442/detecting-integer-constant-expressions-in-macros [2] https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-support-arriving-in-msvc/
Вложения
David Rowley <dgrowleyml@gmail.com> writes: > I did a quick check of the postgres.exe size with and without this > change. It does save about 13KB, but that seems to be entirely from > the /std:c11 flag. None is due to the compiler emitting less code > after elog(ERROR) / ereport(ERROR) :-( Hmmm ... but you did check that in fact we can remove such known-dead code and not get a warning now? regards, tom lane
On Thu, 24 Jul 2025 at 12:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmmm ... but you did check that in fact we can remove such known-dead > code and not get a warning now? Yes. The patch has a small temporary adjustment to BaseBackupGetTargetHandle() to comment out the return. It compiles for me using Visual Studio 2022 without any warnings. If I remove the macro change, I get: [598/2255] Compiling C object src/backend/postgres_lib.a.p/backup_basebackup_target.c.obj src\backend\backup\basebackup_target.c(150) : warning C4715: 'BaseBackupGetTargetHandle': not all control paths return a value David
David Rowley <dgrowleyml@gmail.com> writes: > On Thu, 24 Jul 2025 at 12:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmmm ... but you did check that in fact we can remove such known-dead >> code and not get a warning now? > Yes. The patch has a small temporary adjustment to > BaseBackupGetTargetHandle() to comment out the return. It compiles for > me using Visual Studio 2022 without any warnings. If I remove the > macro change, I get: > [598/2255] Compiling C object > src/backend/postgres_lib.a.p/backup_basebackup_target.c.obj > src\backend\backup\basebackup_target.c(150) : warning C4715: > 'BaseBackupGetTargetHandle': not all control paths return a value OK. I'd vote for going ahead and seeing what the buildfarm says. regards, tom lane
On 24.07.25 03:14, David Rowley wrote: > So, I think that means we can adjust the meson build scripts to pass > /std:c11 when building in MSVC. The attached patch does this and > defines a pg_builtin_constant() macro and adjusts ereport_domain() to > use it. Please review my patch at https://www.postgresql.org/message-id/ccb273c9-7544-4748-8638-30feba212e6e@eisentraut.org https://commitfest.postgresql.org/patch/5934/ for C11 support across all platforms. Then we avoid the hybrid C99/C11 situation that your patch introduces.
On Thu, 24 Jul 2025 at 23:03, Peter Eisentraut <peter@eisentraut.org> wrote: > Please review my patch at > > https://www.postgresql.org/message-id/ccb273c9-7544-4748-8638-30feba212e6e@eisentraut.org > https://commitfest.postgresql.org/patch/5934/ Now that we're building with C11, here's a rebased patch with the new pg_builtin_constant() macro. I did some study into the code generation with the patched version. I see that errstart_cold() is being used correctly when the elevel is >= ERROR. Unfortunately, MSVC does not seem to have anything like __attribute__((cold)), so that does nothing different than what errstart() does. I also looked at the size of postgres.exe to see if the code generation had improved due to ereport_domain's __assume(0) being correctly picked up now. I was surprised to see that the binary had increased in size by about 13K: master: 10147840 bytes. patched: 10161152 bytes. Most of this extra seems to come from the errstart_cold function, as if I adjust ereport_domain with: - errstart_cold(elevel, domain) : \ + errstart(elevel, domain) : \ The size is 10149888 bytes, or 2k more than master. The reason I looked into this is because I wanted to check MSVC was correctly eliminating the errstart_code / errstart ternary condition. It does. I'm not sure where the extra 2k comes from, however. I'm unable to detect any performance differences running pgbench -M prepared -T 60 -S or with pgbench -M simple -T 60 -S. David
Вложения
On 02.09.25 04:57, David Rowley wrote: > On Thu, 24 Jul 2025 at 23:03, Peter Eisentraut <peter@eisentraut.org> wrote: >> Please review my patch at >> >> https://www.postgresql.org/message-id/ccb273c9-7544-4748-8638-30feba212e6e@eisentraut.org >> https://commitfest.postgresql.org/patch/5934/ > > Now that we're building with C11, here's a rebased patch with the new > pg_builtin_constant() macro. +#if defined(HAVE__BUILTIN_CONSTANT_P) +#define pg_builtin_constant(x) __builtin_constant_p(x) +#define HAVE_PG_BUILTIN_CONSTANT +#elif defined(_MSC_VER) && defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L +#define pg_builtin_constant(x) \ + _Generic((1 ? ((void *) ((x) * (uintptr_t) 0)) : &(int) {1}), int *: 1, void *: 0) +#define HAVE_PG_BUILTIN_CONSTANT +#endif The variant using _Generic is not a full replacement for __builtin_constant_p(), because it only detects integer constant expressions. So for example, it wouldn't work in the use in src/include/utils/memutils.h, which checks for constant strings. So I think we need to be careful here to maintain the difference. I think what we could do is make a separate macro for detecting integer constant expressions (like ICE_P() in the reddit thread) and define that to __builtin_constant_p if available, else using _Generic. (We can't use _Generic on all platforms yet, that's a separate undertaking, but I think all platforms support either __builtin_constant_p or _Generic.) And then use that one for ereport. It would also be nice to provide a comment with some explanation and/or a link and credit for the _Generic expression. Btw., I think we should stick to the *_p() naming (for "predicate", I think) for compiler-intrinsic-affiliated functions/macros that report boolean results. The __STDC_VERSION__ comparison can be dropped, since that is the minimum now required. (But you need to keep defined(__STDC_VERSION__), since this won't work in C++.)
On Wed, 3 Sept 2025 at 23:32, Peter Eisentraut <peter@eisentraut.org> wrote: > The variant using _Generic is not a full replacement for > __builtin_constant_p(), because it only detects integer constant > expressions. So for example, it wouldn't work in the use in > src/include/utils/memutils.h, which checks for constant strings. So I > think we need to be careful here to maintain the difference. Makes sense. > I think what we could do is make a separate macro for detecting integer > constant expressions (like ICE_P() in the reddit thread) and define that > to __builtin_constant_p if available, else using _Generic. (We can't > use _Generic on all platforms yet, that's a separate undertaking, but I > think all platforms support either __builtin_constant_p or _Generic.) > And then use that one for ereport. Done > It would also be nice to provide a comment with some explanation and/or > a link and credit for the _Generic expression. I've added a link to the stackoverflow thread in the comment above the macro's definition. > Btw., I think we should stick to the *_p() naming (for "predicate", I > think) for compiler-intrinsic-affiliated functions/macros that report > boolean results. I didn't know what the _p suffix was meant to indicate. Do you have a link which states that it's for "predicate"? The other things I had in mind were "pointer" and "proprocessor", neither of which makes much sense to me. Looking through [1], the only other intrinsic function I see ending in _p is __builtin_types_compatible_p(). I don't see what these both have in common with each other. I've modified the patch to include the _p, however, I'd personally rather leave this out as if someone asks me what it's for, I've got no answer for them, other than Peter said. > The __STDC_VERSION__ comparison can be dropped, since that is the > minimum now required. (But you need to keep defined(__STDC_VERSION__), > since this won't work in C++.) Done. Updated patch attached. Thanks for the review. David [1] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Вложения
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 3 Sept 2025 at 23:32, Peter Eisentraut <peter@eisentraut.org> wrote: >> Btw., I think we should stick to the *_p() naming (for "predicate", I >> think) for compiler-intrinsic-affiliated functions/macros that report >> boolean results. > I didn't know what the _p suffix was meant to indicate. Do you have a > link which states that it's for "predicate"? It absolutely stands for "predicate". That's an ancient Lisp-ism. Here's the first link I found with some quick googling: https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node69.html A predicate is a function that tests for some condition involving its arguments and returns nil if the condition is false, or some non-nil value if the condition is true. One may think of a predicate as producing a Boolean value, where nil stands for false and anything else stands for true. Conditional control structures such as cond, if, when, and unless test such Boolean values. We say that a predicate is true when it returns a non-nil value, and is false when it returns nil; that is, it is true or false according to whether the condition being tested is true or false. By convention, the names of predicates usually end in the letter p (which stands for ``predicate''). Common Lisp uses a uniform convention in hyphenating names of predicates. If the name of the predicate is formed by adding a p to an existing name, such as the name of a data type, a hyphen is placed before the final p if and only if there is a hyphen in the existing name. For example, number begets numberp but standard-char begets standard-char-p. On the other hand, if the name of a predicate is formed by adding a prefixing qualifier to the front of an existing predicate name, the two names are joined with a hyphen and the presence or absence of a hyphen before the final p is not changed. For example, the predicate string-lessp has no hyphen before the p because it is the string version of lessp (a MacLisp function that has been renamed < in Common Lisp). The name string-less-p would incorrectly imply that it is a predicate that tests for a kind of object called a string-less, and the name stringlessp would connote a predicate that tests whether something has no strings (is ``stringless'')! Okay, that last part is pretty far down in the weeds. But a "p" suffix meaning "predicate" has decades of history behind it. regards, tom lane
On Wed, 17 Sept 2025 at 16:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > On Wed, 3 Sept 2025 at 23:32, Peter Eisentraut <peter@eisentraut.org> wrote: > >> Btw., I think we should stick to the *_p() naming (for "predicate", I > >> think) for compiler-intrinsic-affiliated functions/macros that report > >> boolean results. > > > I didn't know what the _p suffix was meant to indicate. Do you have a > > link which states that it's for "predicate"? > > It absolutely stands for "predicate". That's an ancient Lisp-ism. > Here's the first link I found with some quick googling: > > https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node69.html Thanks for the confirmation. I'm happy enough to leave the _p in there, but at the same time, I don't see the particular reason to follow some ancient Lisp rule. Maybe I'm in the minority, having never programmed in Lisp. Anyway, at least the justification for it is in the archives now. Thanks. David
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 17 Sept 2025 at 16:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It absolutely stands for "predicate". That's an ancient Lisp-ism. > Thanks for the confirmation. I'm happy enough to leave the _p in > there, but at the same time, I don't see the particular reason to > follow some ancient Lisp rule. I'm just saying that that's surely where the gcc crew got it from. I do agree with Peter that we should follow that convention within the narrow realm of compiler intrinsics; but I'm not arguing for running around and renaming random PG functions to something_p. As long as we're delving into weeds: we have another project convention for "_P", which is terminal symbols in our grammar such as NULL_P. Clearly, that "P" is not for "predicate"; I suppose it should be read as "parse" or "parser". Given that large parts of original POSTGRES were written in Lisp, it's a tad hard to believe that whoever chose those names had not heard of the "predicate" convention. I guess he/she figured that it didn't matter because there'd be very little overlap or scope for confusion, and that seems to have been borne out over time. regards, tom lane
On Wed, 17 Sept 2025 at 15:52, David Rowley <dgrowleyml@gmail.com> wrote: > Updated patch attached. Thanks for the review. Now pushed and awaiting buildfarm feedback. Thanks for reviewing. David
On 27.09.25 12:43, David Rowley wrote: > On Wed, 17 Sept 2025 at 15:52, David Rowley <dgrowleyml@gmail.com> wrote: >> Updated patch attached. Thanks for the review. > > Now pushed and awaiting buildfarm feedback. Cool, seems to work. I also tried it on CI by removing a few "silence compiler warning" lines. Quick follow-up: How about we rename pg_builtin_integer_constant_p to pg_integer_constant_p, since it's not actually built-in? See attached patch.
Вложения
On Tue, 30 Sept 2025 at 18:24, Peter Eisentraut <peter@eisentraut.org> wrote: > Quick follow-up: How about we rename pg_builtin_integer_constant_p to > pg_integer_constant_p, since it's not actually built-in? See attached > patch. Seems like a good idea. The patch looks good. Thanks for fixing the __builtin_const_p typo that I managed to introduce too :| David
On 30.09.25 11:03, David Rowley wrote: > On Tue, 30 Sept 2025 at 18:24, Peter Eisentraut <peter@eisentraut.org> wrote: >> Quick follow-up: How about we rename pg_builtin_integer_constant_p to >> pg_integer_constant_p, since it's not actually built-in? See attached >> patch. > > Seems like a good idea. > > The patch looks good. Thanks for fixing the __builtin_const_p typo > that I managed to introduce too :| pushed