Обсуждение: Remove useless casting to the same type
Hi hackers, Attached is a patch to $SUBJECT. This is the same kind of idea as 7f798aca1d5 and ef8fe693606, as their presence could cause risks of hiding actual type mismatches in the future or silently discarding qualifiers. I think that it also improves readability. Those have been found with a coccinelle script as: " @@ type T; T E; @@ - (T) E " which removes casts when it's casting to the same type. Note that it generated more that what is in the attached. I chose not to remove some of them (mainly arithmetic ones) to keep the patch focused on what matters here. Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
On Mon, Nov 24, 2025 at 10:26:32AM +0000, Bertrand Drouvot wrote: > This is the same kind of idea as 7f798aca1d5 and ef8fe693606, as their presence > could cause risks of hiding actual type mismatches in the future or silently > discarding qualifiers. I think that it also improves readability. Seems reasonable to me. > Note that it generated more that what is in the attached. I chose not to remove > some of them (mainly arithmetic ones) to keep the patch focused on what matters > here. Can you give an example of what you are talking about here? -- nathan
On Mon, Nov 24, 2025 at 2:26 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > Attached is a patch to $SUBJECT. > --- a/src/backend/access/gin/gindatapage.c > +++ b/src/backend/access/gin/gindatapage.c > @@ -607,11 +607,11 @@ dataBeginPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, > > if (append) > elog(DEBUG2, "appended %d new items to block %u; %d bytes (%d to go)", > - maxitems, BufferGetBlockNumber(buf), (int) leaf->lsize, > + maxitems, BufferGetBlockNumber(buf), leaf->lsize, > items->nitem - items->curitem - maxitems); Hm. How do we feel, as a group, about superstitious casts in variadic calls? I don't feel like they're in the same class as the other fixes. Argument for: it's nice to know at a glance that a printf() invocation won't corrupt something horribly, especially if I'm quickly scanning code during a CVE analysis, and especially if the variable is named as if it could maybe be a size_t. Do our compilers warn us well enough now, in practice? Argument against: it takes up precious columns and focuses attention away from other things. Like the fact that (items->nitem - items->curitem - maxitems) is unsigned and printed as signed here. :D Maybe we should make the code compile cleanly under -Wformat-signedness at some point... (not in this patch, not signing you up for that) > @@ -389,7 +389,7 @@ hash_xlog_split_allocate_page(XLogReaderState *record) > > /* extract low and high masks. */ > memcpy(&lowmask, data, sizeof(uint32)); > - highmask = (uint32 *) ((char *) data + sizeof(uint32)); > + highmask = (uint32 *) (data + sizeof(uint32)); I wonder about these, too. I like knowing what the code does without having to check the type of `data`. But then later on we do a `data += sizeof(uint32) * 2`, so you have to check the type anyway, so... I don't know. > +++ b/contrib/fuzzystrmatch/dmetaphone.c > @@ -327,7 +327,7 @@ GetAt(metastring *s, int pos) > if ((pos < 0) || (pos >= s->length)) > return '\0'; > > - return ((char) *(s->str + pos)); > + return *(s->str + pos); Isn't this just s->str[pos]? Ditto for SetAt(), right afterwards. --Jacob
Hi,
On Mon, Nov 24, 2025 at 12:43:28PM -0600, Nathan Bossart wrote:
> On Mon, Nov 24, 2025 at 10:26:32AM +0000, Bertrand Drouvot wrote:
> > This is the same kind of idea as 7f798aca1d5 and ef8fe693606, as their presence
> > could cause risks of hiding actual type mismatches in the future or silently
> > discarding qualifiers. I think that it also improves readability.
>
> Seems reasonable to me.
Thanks for looking at it!
> > Note that it generated more that what is in the attached. I chose not to remove
> > some of them (mainly arithmetic ones) to keep the patch focused on what matters
> > here.
>
> Can you give an example of what you are talking about here?
Things like:
A)
- int k = (int) (targrows * sampler_random_fract(&rstate.randstate));
+ int k = (targrows * sampler_random_fract(&rstate.randstate));
That's a valid cast removal but I'm not sure about the removal added value here.
B)
- sign = (BITVECP) (((char *) sign) + 1);
+ sign = ((sign) + 1);
BITVECP is "typedef char *BITVECP; So that's a valid cast removal but I decided
to keep it for semantic reason.
Same as:
@@ -2277,7 +2277,7 @@ printTrgmColor(StringInfo buf, TrgmColor co)
else if (co == COLOR_BLANK)
appendStringInfoChar(buf, 'b');
else
- appendStringInfo(buf, "%d", (int) co);
+ appendStringInfo(buf, "%d", co);
where TrgmColor is "typedef int TrgmColor;"
C)
- if (((unsigned char *) base)[i] != 0xff)
+ if ((base)[i] != 0xff)
because not safe.
See attached the full list that I decided not to include.
Do you think we should add some of them in the patch? (maybe the ones in
nbtcompare.c for example)
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения
Hi, On Mon, Nov 24, 2025 at 03:18:00PM -0800, Jacob Champion wrote: > On Mon, Nov 24, 2025 at 2:26 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > Attached is a patch to $SUBJECT. > > > --- a/src/backend/access/gin/gindatapage.c > > +++ b/src/backend/access/gin/gindatapage.c > > @@ -607,11 +607,11 @@ dataBeginPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, > > > > if (append) > > elog(DEBUG2, "appended %d new items to block %u; %d bytes (%d to go)", > > - maxitems, BufferGetBlockNumber(buf), (int) leaf->lsize, > > + maxitems, BufferGetBlockNumber(buf), leaf->lsize, > > items->nitem - items->curitem - maxitems); > > Hm. How do we feel, as a group, about superstitious casts in variadic > calls? I don't feel like they're in the same class as the other fixes. > > Argument for: it's nice to know at a glance that a printf() invocation > won't corrupt something horribly, especially if I'm quickly scanning > code during a CVE analysis, and especially if the variable is named as > if it could maybe be a size_t. Do our compilers warn us well enough > now, in practice? > > Argument against: it takes up precious columns and focuses attention > away from other things. Thanks for looking at it! I think that the variadic calls in the patch are related to functions that can benefits from -Wformat. Let's focus on those: with the cast one would need to verify 3 things: variable type, cast and format specifier. Without the cast then only 2 things and the compiler can verify these match via -Wformat warnings. With the cast, the compiler only checks that the cast result matches the format, not whether the cast itself is correct, so I'm in favor of removing the cast, thoughts? > Like the fact that (items->nitem - > items->curitem - maxitems) is unsigned and printed as signed here. :D Nice catch! ;-) > Maybe we should make the code compile cleanly under > -Wformat-signedness at some point... good idea, will give it a try later on outside the context of this patch. > > @@ -389,7 +389,7 @@ hash_xlog_split_allocate_page(XLogReaderState *record) > > > > /* extract low and high masks. */ > > memcpy(&lowmask, data, sizeof(uint32)); > > - highmask = (uint32 *) ((char *) data + sizeof(uint32)); > > + highmask = (uint32 *) (data + sizeof(uint32)); > > I wonder about these, too. I like knowing what the code does without > having to check the type of `data`. But then later on we do a `data += > sizeof(uint32) * 2`, so you have to check the type anyway, so... I > don't know. I think that even with the cast in place, it's good to check the type of data. Not for the line that follows (i.e: "data += sizeof(uint32) * 2") but to check that the cast makes sense and does not hide "wrong" pointer manipulation. So I think that with or without the cast one would need to check. But that feels more natural to check when there is no cast (as we don't assume that someone said "I know what I'm doing"). So I'm in favor of removing the cast, thoughts? > > +++ b/contrib/fuzzystrmatch/dmetaphone.c > > @@ -327,7 +327,7 @@ GetAt(metastring *s, int pos) > > if ((pos < 0) || (pos >= s->length)) > > return '\0'; > > > > - return ((char) *(s->str + pos)); > > + return *(s->str + pos); > > Isn't this just s->str[pos]? Ditto for SetAt(), right afterwards. Yeah, "*(s->str + pos)" is already used in SetAt() and also in IsVowel(). Instead of changing those 3, I'd prefer to keep the current change and keep the patch focus on its intend. We could change those in a dedicated patch afterward if we feel the need. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 25.11.2025 06:46, Bertrand Drouvot wrote: > I think that the variadic calls in the patch are related to functions that > can benefits from -Wformat. Let's focus on those: with the cast one would need > to verify 3 things: variable type, cast and format specifier. > Without the cast then only 2 things and the compiler can verify these match via > -Wformat warnings. > > With the cast, the compiler only checks that the cast result matches the format, > not whether the cast itself is correct, so I'm in favor of removing the cast, > thoughts? > I agree. It's better if we only have the casts in places where we actually want to change the type before printing. Another benefit is that one can directly deduct the type from the log / print statement by looking at the format specifier. -- David Geier
On 25.11.25 06:46, Bertrand Drouvot wrote: >>> @@ -389,7 +389,7 @@ hash_xlog_split_allocate_page(XLogReaderState *record) >>> >>> /* extract low and high masks. */ >>> memcpy(&lowmask, data, sizeof(uint32)); >>> - highmask = (uint32 *) ((char *) data + sizeof(uint32)); >>> + highmask = (uint32 *) (data + sizeof(uint32)); >> I wonder about these, too. I like knowing what the code does without >> having to check the type of `data`. But then later on we do a `data += >> sizeof(uint32) * 2`, so you have to check the type anyway, so... I >> don't know. > I think that even with the cast in place, it's good to check the type of data. > Not for the line that follows (i.e: "data += sizeof(uint32) * 2") but to check > that the cast makes sense and does not hide "wrong" pointer manipulation. > > So I think that with or without the cast one would need to check. But that feels > more natural to check when there is no cast (as we don't assume that someone > said "I know what I'm doing"). So I'm in favor of removing the cast, thoughts? I think this whole thing could be simplified by overlaying a uint32 over "data" and just accessing the array fields normally. See attached patch.
Вложения
On 25.11.25 06:46, Bertrand Drouvot wrote: >> Hm. How do we feel, as a group, about superstitious casts in variadic >> calls? I don't feel like they're in the same class as the other fixes. >> >> Argument for: it's nice to know at a glance that a printf() invocation >> won't corrupt something horribly, especially if I'm quickly scanning >> code during a CVE analysis, and especially if the variable is named as >> if it could maybe be a size_t. Do our compilers warn us well enough >> now, in practice? >> >> Argument against: it takes up precious columns and focuses attention >> away from other things. > Thanks for looking at it! > > I think that the variadic calls in the patch are related to functions that > can benefits from -Wformat. Let's focus on those: with the cast one would need > to verify 3 things: variable type, cast and format specifier. > Without the cast then only 2 things and the compiler can verify these match via > -Wformat warnings. > > With the cast, the compiler only checks that the cast result matches the format, > not whether the cast itself is correct, so I'm in favor of removing the cast, > thoughts? Yes, I think with -Wformat you get approximately the same level of protection as with a prototype. >> Like the fact that (items->nitem - >> items->curitem - maxitems) is unsigned and printed as signed here. :D > Nice catch! ;-) > >> Maybe we should make the code compile cleanly under >> -Wformat-signedness at some point... That would be similar to using -Wsign-conversion with function prototypes. Maybe a good idea, but we don't use it, and so we shouldn't expect it for non-prototype invocations either.
On 25.11.25 06:46, Bertrand Drouvot wrote: >> Maybe we should make the code compile cleanly under >> -Wformat-signedness at some point... > good idea, will give it a try later on outside the context of this patch. I test this once in a while and fix the issues that I find. But it's very picky and you will find difficult to fix problems like the fact that the signedness of enums is implementation-defined, and so the only portable fix there would be to add more casts. I think it could be useful to tighten the source code with respect to implicit integer conversions, using warnings such as -Wsign-conversion and -Wconversion as well as -Wformat-signedness. There are surely hidden overflow or truncation issues similar to CVE-2025-12818 hidden somewhere. But explicit casts defeat those warnings, so removing unnecessary casts is a good step on the way there.
Hi, On Fri, Nov 28, 2025 at 09:11:16AM +0100, Peter Eisentraut wrote: > On 25.11.25 06:46, Bertrand Drouvot wrote: > > > > @@ -389,7 +389,7 @@ hash_xlog_split_allocate_page(XLogReaderState *record) > > > > > > > > /* extract low and high masks. */ > > > > memcpy(&lowmask, data, sizeof(uint32)); > > > > - highmask = (uint32 *) ((char *) data + sizeof(uint32)); > > > > + highmask = (uint32 *) (data + sizeof(uint32)); > > > I wonder about these, too. I like knowing what the code does without > > > having to check the type of `data`. But then later on we do a `data += > > > sizeof(uint32) * 2`, so you have to check the type anyway, so... I > > > don't know. > > I think that even with the cast in place, it's good to check the type of data. > > Not for the line that follows (i.e: "data += sizeof(uint32) * 2") but to check > > that the cast makes sense and does not hide "wrong" pointer manipulation. > > > > So I think that with or without the cast one would need to check. But that feels > > more natural to check when there is no cast (as we don't assume that someone > > said "I know what I'm doing"). So I'm in favor of removing the cast, thoughts? > > I think this whole thing could be simplified by overlaying a uint32 over > "data" and just accessing the array fields normally. See attached patch. Indeed, that's a nice simplification. - data += sizeof(uint32) * 2; Is it safe? I mean could XLH_SPLIT_META_UPDATE_MASKS and XLH_SPLIT_META_UPDATE_SPLITPOINT be set simultaneously? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 28.11.25 10:06, Bertrand Drouvot wrote: > Hi, > > On Fri, Nov 28, 2025 at 09:11:16AM +0100, Peter Eisentraut wrote: >> On 25.11.25 06:46, Bertrand Drouvot wrote: >>>>> @@ -389,7 +389,7 @@ hash_xlog_split_allocate_page(XLogReaderState *record) >>>>> >>>>> /* extract low and high masks. */ >>>>> memcpy(&lowmask, data, sizeof(uint32)); >>>>> - highmask = (uint32 *) ((char *) data + sizeof(uint32)); >>>>> + highmask = (uint32 *) (data + sizeof(uint32)); >>>> I wonder about these, too. I like knowing what the code does without >>>> having to check the type of `data`. But then later on we do a `data += >>>> sizeof(uint32) * 2`, so you have to check the type anyway, so... I >>>> don't know. >>> I think that even with the cast in place, it's good to check the type of data. >>> Not for the line that follows (i.e: "data += sizeof(uint32) * 2") but to check >>> that the cast makes sense and does not hide "wrong" pointer manipulation. >>> >>> So I think that with or without the cast one would need to check. But that feels >>> more natural to check when there is no cast (as we don't assume that someone >>> said "I know what I'm doing"). So I'm in favor of removing the cast, thoughts? >> >> I think this whole thing could be simplified by overlaying a uint32 over >> "data" and just accessing the array fields normally. See attached patch. > > Indeed, that's a nice simplification. > > - data += sizeof(uint32) * 2; > > Is it safe? I mean could XLH_SPLIT_META_UPDATE_MASKS and XLH_SPLIT_META_UPDATE_SPLITPOINT > be set simultaneously? Yes, that's what was probably intended. But apparently not exercised in the tests. So maybe more like this patch.
Вложения
Hi, On Fri, Nov 28, 2025 at 02:20:25PM +0100, Peter Eisentraut wrote: > On 28.11.25 10:06, Bertrand Drouvot wrote: > > On Fri, Nov 28, 2025 at 09:11:16AM +0100, Peter Eisentraut wrote: > > > I think this whole thing could be simplified by overlaying a uint32 over > > > "data" and just accessing the array fields normally. See attached patch. > > > > Indeed, that's a nice simplification. > > > > - data += sizeof(uint32) * 2; > > > > Is it safe? I mean could XLH_SPLIT_META_UPDATE_MASKS and XLH_SPLIT_META_UPDATE_SPLITPOINT > > be set simultaneously? > > Yes, that's what was probably intended. But apparently not exercised in the > tests. > > So maybe more like this patch. + uint32 lowmask = uidata[uidatacount++]; + uint32 highmask = uidata[uidatacount++]; good idea! That way that's easier to add more branches/flags later if we need to. Also, I think that's safe thanks to XLogRecGetBlockData() returning a MAXALIGNed buffer. Not sure if it's worth to add a comment. I think that a comment was not needed with the original code as it was using memcpy() instead. Except for the nit comment remark above, LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Nov 24, 2025 at 9:46 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> I think that even with the cast in place, it's good to check the type of data.
> Not for the line that follows (i.e: "data += sizeof(uint32) * 2") but to check
> that the cast makes sense and does not hide "wrong" pointer manipulation.
From a specifier standpoint ("accidentally got rid of const"), yes.
From a pointer math standpoint, `(char *) ptr + offset` has a specific
meaning to me, and I was arguing that hiding that meaning isn't
necessarily useful in all cases.
But! Peter's followup makes this moot. Which is good, and it might
serve as its own blueprint for when we next see a situation like this.
> Yeah, "*(s->str + pos)" is already used in SetAt() and also in IsVowel(). Instead
> of changing those 3, I'd prefer to keep the current change and keep the patch
> focus on its intend. We could change those in a dedicated patch afterward if we
> feel the need.
Fine by me. Thanks for this cleanup work!
--Jacob
On Fri, Nov 28, 2025 at 12:41 AM Peter Eisentraut <peter@eisentraut.org> wrote: > I test this once in a while and fix the issues that I find. But it's > very picky and you will find difficult to fix problems like the fact > that the signedness of enums is implementation-defined, and so the only > portable fix there would be to add more casts. Once you've gotten into that situation, don't you have potential sign-extension issues during int promotion, which require casts anyway? It might be nice to fix those up regardless (he said, from his armchair). But that doesn't seem as pressing as the other potential problems. > I think it could be useful to tighten the source code with respect to > implicit integer conversions, using warnings such as -Wsign-conversion > and -Wconversion as well as -Wformat-signedness. There are surely > hidden overflow or truncation issues similar to CVE-2025-12818 hidden > somewhere. But explicit casts defeat those warnings, so removing > unnecessary casts is a good step on the way there. +1. (And your v2 looks good to me.) --Jacob
On 24.11.25 11:26, Bertrand Drouvot wrote: > Hi hackers, > > Attached is a patch to $SUBJECT. > > This is the same kind of idea as 7f798aca1d5 and ef8fe693606, as their presence > could cause risks of hiding actual type mismatches in the future or silently > discarding qualifiers. I think that it also improves readability. > > Those have been found with a coccinelle script as: > > " > @@ > type T; > T E; > @@ > > - (T) > E > " > > which removes casts when it's casting to the same type. > > Note that it generated more that what is in the attached. I chose not to remove > some of them (mainly arithmetic ones) to keep the patch focused on what matters > here. Committed. I made some cosmetic adjustments, to remove line breaks that are no longer necessary with the shortened lines, and in gbt_num_union() I kept &out[0] (instead of just out) for consistency with the following line.
Hi, On Mon, Dec 01, 2025 at 09:50:43AM -0800, Jacob Champion wrote: > Fine by me. Thanks for the review! > Thanks for this cleanup work! That's fun and useful to do! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com