Обсуждение: Some efforts to get rid of "long" in our codebase

Поиск
Список
Период
Сортировка

Some efforts to get rid of "long" in our codebase

От
David Rowley
Дата:
I was playing around with the following imperfect regex to try and get
an idea of how many "long" usages we have.

git grep -E "^\s*(static)?\s*(unsigned)?\s*long\b" -- '*.c' '*.h' | wc -l

REL_13_STABLE -> 486
REL_14_STABLE -> 482
REL_15_STABLE -> 476
REL_16_STABLE -> 485
REL_17_STABLE -> 449
REL_18_STABLE -> 439
master -> 386

(Generally trending downwards with quite a good reduction since v18)

Many of the "long" usages are genuine, e.g for compatibility with
library functions, or in snprintf.c for the %ld and %lu format
specifiers; those we need. However, many of these we don't need. Many
are a possible source of bugs due to not all platforms we support
having the same idea about how wide longs are. We've had and fixed
bugs around this before.

I've attached a couple of patches to get the ball rolling.

0001: CATCACHE_STATS is using signed longs as counters for cache
hits/misses etc. I've changed this to uint64 rather than int64 as I
didn't see the need to have a signed type to count the occurrences of
an event. Maybe there's an anti-universe where negative occurrences
are a thing, but they're not in this one. If something goes awry with
the counters and that causes the subtraction that's being done to
wrap, then we're more likely to notice the bug via the excessively
large number the wrap would end up displaying.

0002: MemSet / MemSetAligned macros. It's probably about time someone
made these disappear, but that's likely for another thread with more
research than I'd like to do here. I replaced "long" with "Size". I
also considered "uintptr_t", but after some reading of the C standard,
I settled on Size as it seems it's possible for platforms to exist
where the pointer width is smaller than the processor's width. I
suspect it might not matter for the platforms we support? Size could
also be smaller than the processor's width, but I feel that's less
likely.

Which yields a drop in the Ocean fewer longs...

0001+0002 -> 367

David

Вложения

Re: Some efforts to get rid of "long" in our codebase

От
Heikki Linnakangas
Дата:
On 06/11/2025 13:46, David Rowley wrote:
> I've attached a couple of patches to get the ball rolling.

Thanks!

> 0001: CATCACHE_STATS is using signed longs as counters for cache
> hits/misses etc. I've changed this to uint64 rather than int64 as I
> didn't see the need to have a signed type to count the occurrences of
> an event. Maybe there's an anti-universe where negative occurrences
> are a thing, but they're not in this one. If something goes awry with
> the counters and that causes the subtraction that's being done to
> wrap, then we're more likely to notice the bug via the excessively
> large number the wrap would end up displaying.

Unsigned makes sense for these.

> @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
>  
>          if (cache->cc_ntup == 0 && cache->cc_searches == 0)
>              continue;            /* don't print unused caches */
> -        elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %d lists,
%ldlsrch, %ld lhits",
 
> +        elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64
"+%"PRIu64 "=%" PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" PRIu64 " lhits",
 
>               cache->cc_relname,
>               cache->cc_indexoid,
>               cache->cc_ntup,

Unfortunately PRIu64 makes these much longer and less readable. I don't 
think there's much we can do about that though. Perhaps split the format 
string to multiple lines?

> 0002: MemSet / MemSetAligned macros. It's probably about time someone
> made these disappear, but that's likely for another thread with more
> research than I'd like to do here. I replaced "long" with "Size". I
> also considered "uintptr_t", but after some reading of the C standard,
> I settled on Size as it seems it's possible for platforms to exist
> where the pointer width is smaller than the processor's width. I
> suspect it might not matter for the platforms we support? Size could
> also be smaller than the processor's width, but I feel that's less
> likely.

Let's use size_t instead of Size in new code, per previous discussions 

(https://www.postgresql.org/message-id/flat/CA%2BRLCQyVqb9Xxni6x2iJYTawMbJv5gZY2fzNaw39%3D_yOtu_QKw%40mail.gmail.com#74391f75a649df13920f87360e361183)

- Heikki




Re: Some efforts to get rid of "long" in our codebase

От
Peter Eisentraut
Дата:
On 06.11.25 13:17, Heikki Linnakangas wrote:
>> @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
>>
>>          if (cache->cc_ntup == 0 && cache->cc_searches == 0)
>>              continue;            /* don't print unused caches */
>> -        elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld 
>> hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits",
>> +        elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" 
>> PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%" 
>> PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" 
>> PRIu64 " lhits",
>>               cache->cc_relname,
>>               cache->cc_indexoid,
>>               cache->cc_ntup,
> 
> Unfortunately PRIu64 makes these much longer and less readable. I don't 
> think there's much we can do about that though. Perhaps split the format 
> string to multiple lines?

You could also use unsigned long long int for these, to make the format 
strings more readable.




Re: Some efforts to get rid of "long" in our codebase

От
Peter Eisentraut
Дата:
On 06.11.25 12:46, David Rowley wrote:
> 0002: MemSet / MemSetAligned macros. It's probably about time someone
> made these disappear, but that's likely for another thread with more
> research than I'd like to do here. I replaced "long" with "Size". I
> also considered "uintptr_t", but after some reading of the C standard,
> I settled on Size as it seems it's possible for platforms to exist
> where the pointer width is smaller than the processor's width. I
> suspect it might not matter for the platforms we support? Size could
> also be smaller than the processor's width, but I feel that's less
> likely.

I think size_t/Size could be misleading here.  You're not measuring any 
size, you're just chunking up the bytes to zero into something that we 
thing the compiler or CPU can handle very efficiently.

So in a sense, using long isn't wrong here.  It might well be the best 
for this.  If there is an aversion to using any long at all, why not 
long long.



Re: Some efforts to get rid of "long" in our codebase

От
David Rowley
Дата:
On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 06.11.25 12:46, David Rowley wrote:
> > 0002: MemSet / MemSetAligned macros. It's probably about time someone
> > made these disappear, but that's likely for another thread with more
> > research than I'd like to do here. I replaced "long" with "Size". I
> > also considered "uintptr_t", but after some reading of the C standard,
> > I settled on Size as it seems it's possible for platforms to exist
> > where the pointer width is smaller than the processor's width. I
> > suspect it might not matter for the platforms we support? Size could
> > also be smaller than the processor's width, but I feel that's less
> > likely.
>
> I think size_t/Size could be misleading here.  You're not measuring any
> size, you're just chunking up the bytes to zero into something that we
> thing the compiler or CPU can handle very efficiently.
>
> So in a sense, using long isn't wrong here.  It might well be the best
> for this.  If there is an aversion to using any long at all, why not
> long long.

The point in changing this was that long isn't a good fit as it's not
64-bit on 64-bit Windows. My aim is to find a type with a width the
same as the processor's word size. long long tends to be 64-bit on
32-bit platforms, so doesn't seem a very good fit. Someone might argue
that doesn't matter now since we no longer have 4-byte Datums, but
IMO, this isn't any reason to make things even worse for 32-bit
builds.

David



Re: Some efforts to get rid of "long" in our codebase

От
Peter Eisentraut
Дата:
On 06.11.25 21:06, David Rowley wrote:
> On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> On 06.11.25 12:46, David Rowley wrote:
>>> 0002: MemSet / MemSetAligned macros. It's probably about time someone
>>> made these disappear, but that's likely for another thread with more
>>> research than I'd like to do here. I replaced "long" with "Size". I
>>> also considered "uintptr_t", but after some reading of the C standard,
>>> I settled on Size as it seems it's possible for platforms to exist
>>> where the pointer width is smaller than the processor's width. I
>>> suspect it might not matter for the platforms we support? Size could
>>> also be smaller than the processor's width, but I feel that's less
>>> likely.
>>
>> I think size_t/Size could be misleading here.  You're not measuring any
>> size, you're just chunking up the bytes to zero into something that we
>> thing the compiler or CPU can handle very efficiently.
>>
>> So in a sense, using long isn't wrong here.  It might well be the best
>> for this.  If there is an aversion to using any long at all, why not
>> long long.
> 
> The point in changing this was that long isn't a good fit as it's not
> 64-bit on 64-bit Windows. My aim is to find a type with a width the
> same as the processor's word size. long long tends to be 64-bit on
> 32-bit platforms, so doesn't seem a very good fit. Someone might argue
> that doesn't matter now since we no longer have 4-byte Datums, but
> IMO, this isn't any reason to make things even worse for 32-bit
> builds.

Yeah you're right.  If we cared a lot about MemSet's longevity, I would 
suggest making a new type for this, but that would leak into all users 
of c.h, so maybe not.

I do suggest some kind of comment, that we're using size_t as a 
convenient proxy for the most suitable chunk/step size for the platform, 
not to actually measure a size.




Re: Some efforts to get rid of "long" in our codebase

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
> I do suggest some kind of comment, that we're using size_t as a 
> convenient proxy for the most suitable chunk/step size for the platform, 
> not to actually measure a size.

Yeah.  There's not actually anything wrong with using "long" here,
except that we believe that on Win64 it's probably not the processor's
native word width.  Size/size_t is more likely to match that.

([u]intptr_t is even more likely to match, but I don't think that's a
good choice because it invites confusion with the usage of uintptr_t
for address arithmetic elsewhere in the macro.  That's quite unrelated
to the choice of copy step size.)

            regards, tom lane



Re: Some efforts to get rid of "long" in our codebase

От
David Rowley
Дата:
On Fri, 7 Nov 2025 at 07:26, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 06.11.25 13:17, Heikki Linnakangas wrote:
> >> @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
> >>
> >>          if (cache->cc_ntup == 0 && cache->cc_searches == 0)
> >>              continue;            /* don't print unused caches */
> >> -        elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld
> >> hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits",
> >> +        elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %"
> >> PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%"
> >> PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %"
> >> PRIu64 " lhits",
> >>               cache->cc_relname,
> >>               cache->cc_indexoid,
> >>               cache->cc_ntup,
> >
> > Unfortunately PRIu64 makes these much longer and less readable. I don't
> > think there's much we can do about that though. Perhaps split the format
> > string to multiple lines?
>
> You could also use unsigned long long int for these, to make the format
> strings more readable.

I couldn't really decide on what's best here. I'd personally rather be
more explicit about the width of the type by using uint64, but on the
other hand, I don't know of any realistic modern hardware/compiler
combination where unsigned long long isn't 64-bit. Certainly, the
format string is easier to read with %llu.

v2-0001 wraps the format string as suggested by Heikki, v3-0001 uses
unsigned long long as suggested by Peter.

v2-0002 is updated to use size_t instead of Size, per Heikki

Any further opinions or votes on v2-0001 vs v3-0001?

David

Вложения

Re: Some efforts to get rid of "long" in our codebase

От
Andreas Karlsson
Дата:
On 11/6/25 12:46 PM, David Rowley wrote:
> I've attached a couple of patches to get the ball rolling.

This inspired me to write my own patch to chip away at the use of long.

The target for the attached patch is TimestampDifference() and 
feTimestampDifference() which return the timestamp difference in seconds 
and microseconds as a long and an int and replace it with in64 and 
int32. The return values of these functions are used a lot for printing 
durations to the log and sometimes to populate a struct timeval (in that 
case returning a timeval would make some sense but time_t is sadly 32 
bits on some platforms).

The patch also cleans up code a bit in check_log_duration() which was 
unnecessarily complicated.

But writing the mail made me wonder if a cleaner solution wouldn't be to 
just make TimestampDifference() return a TimeOffset and then write 
helper functions to e.g. convert a TimeOffset to a timeval or to 
milliseconds.

Andreas

Вложения

Re: Some efforts to get rid of "long" in our codebase

От
David Rowley
Дата:
On Mon, 10 Nov 2025 at 13:53, David Rowley <dgrowleyml@gmail.com> wrote:
> v2-0001 wraps the format string as suggested by Heikki, v3-0001 uses
> unsigned long long as suggested by Peter.
>
> v2-0002 is updated to use size_t instead of Size, per Heikki
>
> Any further opinions or votes on v2-0001 vs v3-0001?

Nobody seems to feel strongly either way, so I looked again and
thought that using uint64 is nicer as the size of the type is
explicit. We do want a 64-bit type here, not something bigger, which
in theory, long long could be. The less readable format, IMO seemed
like an ok trade-off to be explicit about the type's size.

With that, I pushed the CATCACHE_STATS patch. I also pushed the
MemSet/MemSetAligned one too.

Thanks for looking.

David