Обсуждение: "long" type is not appropriate for counting tuples
Commit ab0dfc961b6 used a "long" variable within _bt_load() to count the number of tuples entered into a B-Tree index as it is built. This will not work as expected on Windows, even on 64-bit Windows, because "long" is only 32-bits wide. It's far from impossible that you'd have ~2 billion index tuples when building a new index. Programmers use "long" because it is assumed to be wider than "int" (even though that isn't required by the standard, and isn't true across all of the platforms we support). The use of "long" seems inherently suspect given our constraints, except perhaps in the context of sizing work_mem-based allocations, where it is used as part of a semi-standard idiom...albeit one that only works because of the restrictions on work_mem size on Windows. ISTM that we should try to come up with a way of making code like this work, rather than placing the burden on new code to get it right. This exact issue has bitten users on a number of occasions that I can recall. There is also a hidden landmine that we know about but haven't fixed: logtape.c, which will break on Windows with very very large index builds. Also, "off_t" is only 32-bits on Windows, which broke parallel CREATE INDEX (issued fixed by commit aa551830). I suppose that "off_t" is really a variant of the same problem. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > Commit ab0dfc961b6 used a "long" variable within _bt_load() to count > the number of tuples entered into a B-Tree index as it is built. This > will not work as expected on Windows, even on 64-bit Windows, because > "long" is only 32-bits wide. Right. "long" used to be our convention years ago, but these days tuple counts should be int64 or perhaps uint64. See e.g. 23a27b039. > ISTM that we should try to come up with a way of making code like this > work, rather than placing the burden on new code to get it right. Other than "use the right datatype", I'm not sure what we can do? In the meantime, somebody should fix ab0dfc961b6 ... > Also, "off_t" is only 32-bits on Windows, which broke parallel CREATE > INDEX (issued fixed by commit aa551830). I suppose that "off_t" is > really a variant of the same problem. Hmm, why is this a problem? We should only use off_t for actual file access operations, and we don't use files greater than 1GB. (There's a reason for that.) regards, tom lane
On Sun, Apr 28, 2019 at 4:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > ISTM that we should try to come up with a way of making code like this > > work, rather than placing the burden on new code to get it right. > > Other than "use the right datatype", I'm not sure what we can do? Ambiguity seems like the real problem here. If we could impose a general rule that you cannot use "long" (perhaps with some limited wiggle-room), then a lint-like tool could catch bugs like this. This may not be that difficult. Nobody is particularly concerned about performance on 32-bit platforms these days. > In the meantime, somebody should fix ab0dfc961b6 ... I'll leave this to Alvaro. > Hmm, why is this a problem? We should only use off_t for actual file > access operations, and we don't use files greater than 1GB. (There's a > reason for that.) The issue that was fixed by commit aa551830 showed this assumption to be kind of brittle. Admittedly this is not as clear-cut as the "long" issue, and might not be worth worrying about. I don't want to go as far as requiring explicit width integer types in all situations, since that seems totally impractical, and without any real upside. But it would be nice to identify integer types where there is a real risk of making an incorrect assumption, and then eliminate that risk once and for all. -- Peter Geoghegan
Hi, On 2019-04-28 19:24:59 -0400, Tom Lane wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > ISTM that we should try to come up with a way of making code like this > > work, rather than placing the burden on new code to get it right. > > Other than "use the right datatype", I'm not sure what we can do? > In the meantime, somebody should fix ab0dfc961b6 ... I think we should start by just removing all uses of long. There's really no excuse for them today, and a lot of them are bugs waiting to happen. And then either just add a comment to the coding style, or even better a small script, to prevent them from being re-used. > > Also, "off_t" is only 32-bits on Windows, which broke parallel CREATE > > INDEX (issued fixed by commit aa551830). I suppose that "off_t" is > > really a variant of the same problem. > > Hmm, why is this a problem? We should only use off_t for actual file > access operations, and we don't use files greater than 1GB. (There's a > reason for that.) We read from larger files in a few places though. E.g. pg_dump. Most places really just should use pgoff_t... Greetings, Andres Freund
On Mon, Apr 29, 2019 at 8:11 AM Andres Freund <andres@anarazel.de> wrote: > I think we should start by just removing all uses of long. There's > really no excuse for them today, and a lot of them are bugs waiting to > happen. I like the idea of banning "long" altogether. It will probably be hard to keep it out of third party code that we vendor-in, or even code that interfaces with libraries in some way, but it should be removed from everything else. It actually doesn't seem particularly hard to do so, based on a quick grep of src/backend/. Most uses of "long" is code that sizes something in local memory, where "long" works for the same reason as it works when calculating the size of a work_mem allocation -- ugly, but correct. A few uses of "long" seem to be real live bugs, albeit bugs that are very unlikely to ever hit. _h_indexbuild() has the same bug as _bt_load(), also due to commit ab0dfc961b6 -- I spotted that in passing when I used grep. > We read from larger files in a few places though. E.g. pg_dump. Most > places really just should use pgoff_t... I wasn't even aware of pgoff_t. It is only used in frontend utilities that I don't know that much about, whereas off_t is used all over the backend code. -- Peter Geoghegan
Hi, On 2019-04-29 10:16:39 -0700, Peter Geoghegan wrote: > On Mon, Apr 29, 2019 at 8:11 AM Andres Freund <andres@anarazel.de> wrote: > > I think we should start by just removing all uses of long. There's > > really no excuse for them today, and a lot of them are bugs waiting to > > happen. > > I like the idea of banning "long" altogether. It will probably be hard > to keep it out of third party code that we vendor-in, or even code > that interfaces with libraries in some way, but it should be removed > from everything else. I don't think any of the code we've vendored in where we also track upstream, actually uses long in a meaningful amount. And putside of backward compatibility, I don't think there's many libraries that still use it. > > We read from larger files in a few places though. E.g. pg_dump. Most > > places really just should use pgoff_t... > > I wasn't even aware of pgoff_t. It is only used in frontend utilities > that I don't know that much about, whereas off_t is used all over the > backend code. Yea, we've some delightful hackery to make that work: * WIN32 does not provide 64-bit off_t, but does provide the functions operating * with 64-bit offsets. */ #define pgoff_t __int64 #ifdef _MSC_VER #define fseeko(stream, offset, origin) _fseeki64(stream, offset, origin) #define ftello(stream) _ftelli64(stream) #else #ifndef fseeko #define fseeko(stream, offset, origin) fseeko64(stream, offset, origin) #endif #ifndef ftello #define ftello(stream) ftello64(stream) #endif #endif which also shows that the compatibility hackery is fairly limited. Thomas, ISTM, that pg_pread() etc should rather take the offset as a uint64 or such. And then actually initialize OFFSET.offsetHigh. Greetings, Andres Freund
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Apr 29, 2019 at 8:11 AM Andres Freund <andres@anarazel.de> wrote: >> I think we should start by just removing all uses of long. There's >> really no excuse for them today, and a lot of them are bugs waiting to >> happen. > I like the idea of banning "long" altogether. It will probably be hard > to keep it out of third party code that we vendor-in, or even code > that interfaces with libraries in some way, but it should be removed > from everything else. It actually doesn't seem particularly hard to do > so, based on a quick grep of src/backend/. Most uses of "long" is code > that sizes something in local memory, where "long" works for the same > reason as it works when calculating the size of a work_mem allocation > -- ugly, but correct. There's more to that than you might realize. For example, guc.c enforces a limit on work_mem that's designed to ensure that expressions like "work_mem * 1024L" won't overflow, and there are similar choices elsewhere. I'm not sure if we want to go to the effort of rethinking that; it's not really a bug, though it does result in 64-bit Windows being more restricted than it has to be. Trying to get rid of type-L constants along with more explicit uses of "long" would be a PITA I'm afraid. Another problem is that while "%lu" format specifiers are portable, INT64_FORMAT is a *big* pain, not least because you can't put it into translatable strings without causing problems. To the extent that we could go over to "%zu" instead, maybe this could be finessed, but blind "s/long/int64/g" isn't going to be any fun. regards, tom lane
Hi, On 2019-04-29 13:32:13 -0400, Tom Lane wrote: > There's more to that than you might realize. For example, guc.c > enforces a limit on work_mem that's designed to ensure that > expressions like "work_mem * 1024L" won't overflow, and there are > similar choices elsewhere. I'm not sure if we want to go to the > effort of rethinking that; it's not really a bug, though it does > result in 64-bit Windows being more restricted than it has to be. Hm, but why does that require the use of long? We could fairly trivially define a type that's guaranteed to be 32 bit on 32 bit platforms, and 64 bit on 64 bit platforms. Even a dirty hack like using intptr_t instead of long would be better than using long. > Another problem is that while "%lu" format specifiers are portable, > INT64_FORMAT is a *big* pain, not least because you can't put it into > translatable strings without causing problems. To the extent that > we could go over to "%zu" instead, maybe this could be finessed, > but blind "s/long/int64/g" isn't going to be any fun. Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in translated strings. Perhaps we should implement them in our printf, and then replace all use of INT64_FORMAT with that? I've not tested the gettext code, but it's there: /* Expand a system dependent string segment. Return NULL if unsupported. */ static const char * get_sysdep_segment_value (const char *name) { /* Test for an ISO C 99 section 7.8.1 format string directive. Syntax: P R I { d | i | o | u | x | X } { { | LEAST | FAST } { 8 | 16 | 32 | 64 } | MAX | PTR } */ /* We don't use a table of 14 times 6 'const char *' strings here, because data relocations cost startup time. */ if (name[0] == 'P' && name[1] == 'R' && name[2] == 'I') ... Greetings, Andres Freund
On Mon, Apr 29, 2019 at 10:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > There's more to that than you might realize. For example, guc.c > enforces a limit on work_mem that's designed to ensure that > expressions like "work_mem * 1024L" won't overflow, and there are > similar choices elsewhere. I was aware of that, but I wasn't aware of how many places that bleeds into until I checked just now. It would be nice if we could figure out how to make it obvious that the idioms around the use of long for work_mem stuff are idioms that have a specific rationale. It's pretty confusing as things stand. -- Peter Geoghegan
Andres Freund <andres@anarazel.de> writes: > On 2019-04-29 13:32:13 -0400, Tom Lane wrote: >> There's more to that than you might realize. For example, guc.c >> enforces a limit on work_mem that's designed to ensure that >> expressions like "work_mem * 1024L" won't overflow, and there are >> similar choices elsewhere. I'm not sure if we want to go to the >> effort of rethinking that; it's not really a bug, though it does >> result in 64-bit Windows being more restricted than it has to be. > Hm, but why does that require the use of long? We could fairly trivially > define a type that's guaranteed to be 32 bit on 32 bit platforms, and 64 > bit on 64 bit platforms. Even a dirty hack like using intptr_t instead > of long would be better than using long. The point is that (a) work_mem is an int; adding support to GUC for some other integer width would be an unreasonable amount of overhead. (b) "1024L" is a nice simple non-carpal-tunnel-inducing way to get the right width of the product, for some value of "right". If we don't want to rely on "L" constants then we'll have to write these cases like "work_mem * (size_t) 1024" which is ugly, lots more keystrokes, and prone to weird precedence problems unless you throw even more keystrokes (parentheses) at it. I'm not excited about doing that just to allow larger work_mem settings on Win64. (But if we do go in this direction, maybe some notation like #define KILOBYTE ((size_t) 1024) would help.) I'm not suggesting that we don't need to fix uses of "long" for tuple counts, and perhaps other things. But I think getting rid of it in memory size calculations might be a lot of work for not a lot of reward. regards, tom lane
On Mon, Apr 29, 2019 at 11:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > If we don't want to rely on "L" constants then we'll have to write these > cases like "work_mem * (size_t) 1024" which is ugly, lots more keystrokes, > and prone to weird precedence problems unless you throw even more > keystrokes (parentheses) at it. I'm not excited about doing that just > to allow larger work_mem settings on Win64. I don't think that anybody cares about Win64 very much. Simplifying the code might lead to larger work_mem settings on that platform, but that's not the end goal I have in mind. For me, the end goal is simpler code. > I'm not suggesting that we don't need to fix uses of "long" for tuple > counts, and perhaps other things. But I think getting rid of it in memory > size calculations might be a lot of work for not a lot of reward. Whether or not *fully* banning the use of "long" is something that will simplify the code is debatable. However, we could substantially reduce the use of "long" across the backend without any real downside. The work_mem question can be considered later. Does that seem reasonable? -- Peter Geoghegan
On 2019-Apr-28, Peter Geoghegan wrote: > Commit ab0dfc961b6 used a "long" variable within _bt_load() to count > the number of tuples entered into a B-Tree index as it is built. This > will not work as expected on Windows, even on 64-bit Windows, because > "long" is only 32-bits wide. It's far from impossible that you'd have > ~2 billion index tuples when building a new index. Agreed. Here's a patch. I see downthread that you also discovered the same mistake in _h_indexbuild by grepping for "long"; I got to it by examining callers of pgstat_progress_update_param and pgstat_progress_update_multi_param. I didn't find any other mistakes of the same ilk. Some codesites use "double" instead of "int64", but those are not broken. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Hi, On 2019-04-29 11:18:49 -0700, Peter Geoghegan wrote: > On Mon, Apr 29, 2019 at 11:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > If we don't want to rely on "L" constants then we'll have to write these > > cases like "work_mem * (size_t) 1024" which is ugly, lots more keystrokes, > > and prone to weird precedence problems unless you throw even more > > keystrokes (parentheses) at it. I'm not excited about doing that just > > to allow larger work_mem settings on Win64. > > I don't think that anybody cares about Win64 very much. I seriously doubt this assertion. Note that the postgres packages on https://www.postgresql.org/download/windows/ do not support 32bit windows anymore (edb from 11 onwards, bigsql apparently always). And I think there's a pretty substantial number of windows users out there. Greetings, Andres Freund
On Mon, Apr 29, 2019 at 11:20 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Agreed. Here's a patch. I see downthread that you also discovered the > same mistake in _h_indexbuild by grepping for "long"; I got to it by > examining callers of pgstat_progress_update_param and > pgstat_progress_update_multi_param. I didn't find any other mistakes of > the same ilk. Some codesites use "double" instead of "int64", but those > are not broken. This seems fine, though FWIW I probably would have gone with int64 instead of uint64. There is generally no downside to using int64, and being to support negative integers can be useful in some contexts (though not this context). -- Peter Geoghegan
On Mon, Apr 29, 2019 at 11:24 AM Andres Freund <andres@anarazel.de> wrote: > > I don't think that anybody cares about Win64 very much. > > I seriously doubt this assertion. Note that the postgres packages on > https://www.postgresql.org/download/windows/ do not support 32bit > windows anymore (edb from 11 onwards, bigsql apparently always). And I > think there's a pretty substantial number of windows users out there. I was talking about the motivation behind this thread, and I suppose that I included you in that based on things you've said about Windows in the past (apparently I shouldn't have done so). I am interested in making the code less complicated. If we can remove the work_mem kludge for Windows as a consequence of that, then so much the better. -- Peter Geoghegan
On Tue, 30 Apr 2019 at 06:28, Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Apr 29, 2019 at 11:20 AM Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Agreed. Here's a patch. I see downthread that you also discovered the > > same mistake in _h_indexbuild by grepping for "long"; I got to it by > > examining callers of pgstat_progress_update_param and > > pgstat_progress_update_multi_param. I didn't find any other mistakes of > > the same ilk. Some codesites use "double" instead of "int64", but those > > are not broken. > > This seems fine, though FWIW I probably would have gone with int64 > instead of uint64. There is generally no downside to using int64, and > being to support negative integers can be useful in some contexts > (though not this context). CopyFrom() returns uint64. I think it's better to be consistent in the types we use to count tuples in commands. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2019-Apr-30, David Rowley wrote: > On Tue, 30 Apr 2019 at 06:28, Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Mon, Apr 29, 2019 at 11:20 AM Alvaro Herrera > > <alvherre@2ndquadrant.com> wrote: > > > Agreed. Here's a patch. I see downthread that you also discovered the > > > same mistake in _h_indexbuild by grepping for "long"; I got to it by > > > examining callers of pgstat_progress_update_param and > > > pgstat_progress_update_multi_param. I didn't find any other mistakes of > > > the same ilk. Some codesites use "double" instead of "int64", but those > > > are not broken. > > > > This seems fine, though FWIW I probably would have gone with int64 > > instead of uint64. There is generally no downside to using int64, and > > being to support negative integers can be useful in some contexts > > (though not this context). > > CopyFrom() returns uint64. I think it's better to be consistent in the > types we use to count tuples in commands. That's not a bad argument ... but I still committed it as int64, mostly because that's what pgstat_progress_update_param takes. Anyway, these are just local variables, not return values, so it's easily changeable if we determine (??) that unsigned is better. I don't know if anybody plans to do progress report for COPY, but I hope we don't find ourselves in a problem when some user claims that they are inserting more than 2^63 but less than 2^64 tuples. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I don't know if anybody plans to do progress report for COPY, but I hope > we don't find ourselves in a problem when some user claims that they are > inserting more than 2^63 but less than 2^64 tuples. At one tuple per nanosecond, it'd take a shade under 300 years to reach 2^63. Seems like a problem for our descendants to worry about. regards, tom lane
On 2019-04-29 19:32, Tom Lane wrote: > Another problem is that while "%lu" format specifiers are portable, > INT64_FORMAT is a *big* pain, not least because you can't put it into > translatable strings without causing problems. To the extent that > we could go over to "%zu" instead, maybe this could be finessed, > but blind "s/long/int64/g" isn't going to be any fun. Since we control our own snprintf now, this could probably be addressed somehow, right? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-04-29 19:52, Andres Freund wrote: > Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in > translated strings. That won't work in non-GNU gettext. > Perhaps we should implement them in our printf, and > then replace all use of INT64_FORMAT with that? But this might. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On May 22, 2019 7:39:41 AM PDT, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >On 2019-04-29 19:32, Tom Lane wrote: >> Another problem is that while "%lu" format specifiers are portable, >> INT64_FORMAT is a *big* pain, not least because you can't put it into >> translatable strings without causing problems. To the extent that >> we could go over to "%zu" instead, maybe this could be finessed, >> but blind "s/long/int64/g" isn't going to be any fun. > >Since we control our own snprintf now, this could probably be addressed >somehow, right? z is for size_t though? Not immediately first how It'd help us? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > On May 22, 2019 7:39:41 AM PDT, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >> On 2019-04-29 19:32, Tom Lane wrote: >>> Another problem is that while "%lu" format specifiers are portable, >>> INT64_FORMAT is a *big* pain, not least because you can't put it into >>> translatable strings without causing problems. To the extent that >>> we could go over to "%zu" instead, maybe this could be finessed, >>> but blind "s/long/int64/g" isn't going to be any fun. >> Since we control our own snprintf now, this could probably be addressed >> somehow, right? > z is for size_t though? Not immediately first how It'd help us? Yeah, z doesn't reliably translate to int64 either, so it's only useful when the number you're trying to print is a memory object size. I don't really see how controlling snprintf is enough to get somewhere on this. Sure we could invent some new always-64-bit length modifier and teach snprintf.c about it, but none of the other tools we use would know about it. I don't want to give up compiler cross-checking of printf formats, do you? regards, tom lane
Hi, On May 22, 2019 7:40:28 AM PDT, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >On 2019-04-29 19:52, Andres Freund wrote: >> Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in >> translated strings. > >That won't work in non-GNU gettext. Which of those do currently work with postgres? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 2019-05-22 18:59, Andres Freund wrote: > On May 22, 2019 7:40:28 AM PDT, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >> On 2019-04-29 19:52, Andres Freund wrote: >>> Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in >>> translated strings. >> >> That won't work in non-GNU gettext. > > Which of those do currently work with postgres? I don't know what the current situation is, but in the past we've been getting complaints when using GNU-specific features, mostly from Solaris I think. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-05-22 17:52, Tom Lane wrote: > I don't really see how controlling snprintf is enough to get somewhere > on this. Sure we could invent some new always-64-bit length modifier > and teach snprintf.c about it, but none of the other tools we use > would know about it. I don't want to give up compiler cross-checking > of printf formats, do you? Could we define int64 to be long long int on all platforms and just always use %lld? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > Could we define int64 to be long long int on all platforms and just > always use %lld? Hmmm ... maybe. Once upon a time we had to cope with compilers that didn't have "long long", but perhaps that time is past. Another conceivable hazard is somebody deciding that the world needs a platform where "long long" is 128 bits. I don't know how likely that is to happen. As a first step, we could try asking configure to compute sizeof(long long) and seeing what the buildfarm makes of that. regards, tom lane
On 2019-05-22 21:21, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> Could we define int64 to be long long int on all platforms and just >> always use %lld? > > Hmmm ... maybe. Once upon a time we had to cope with compilers > that didn't have "long long", but perhaps that time is past. It's required by C99, and the configure test for C99 checks it. > Another conceivable hazard is somebody deciding that the world > needs a platform where "long long" is 128 bits. I don't know > how likely that is to happen. Another option is that in cases where it doesn't affect storage layouts, like the counting tuples case that started this thread, code could just use long long int directly instead of int64. Then if someone wants to make it 128 bits or 96 bits or whatever it would not be a problem. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, May 23, 2019 at 5:31 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Another option is that in cases where it doesn't affect storage layouts, > like the counting tuples case that started this thread, code could just > use long long int directly instead of int64. Then if someone wants to > make it 128 bits or 96 bits or whatever it would not be a problem. I think that sort of thing tends not to work out well, because at some point it's likely to be sent out via the wire protocol; at that point we'll need a value of a certain width. Better to use that width right from the beginning. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019-05-23 15:52, Robert Haas wrote: > On Thu, May 23, 2019 at 5:31 AM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> Another option is that in cases where it doesn't affect storage layouts, >> like the counting tuples case that started this thread, code could just >> use long long int directly instead of int64. Then if someone wants to >> make it 128 bits or 96 bits or whatever it would not be a problem. > > I think that sort of thing tends not to work out well, because at some > point it's likely to be sent out via the wire protocol; at that point > we'll need a value of a certain width. Better to use that width right > from the beginning. Hmm, by that argument, we shouldn't ever use any integer type other than int16, int32, and int64. I'm thinking for example that pgbench makes a lot of use of int64 and printing that out makes quite messy code. Replacing that by long long int would make this much nicer and should be pretty harmless relative to your concern. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2019-05-23 15:52, Robert Haas wrote: >> On Thu, May 23, 2019 at 5:31 AM Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> Another option is that in cases where it doesn't affect storage layouts, >>> like the counting tuples case that started this thread, code could just >>> use long long int directly instead of int64. Then if someone wants to >>> make it 128 bits or 96 bits or whatever it would not be a problem. >> I think that sort of thing tends not to work out well, because at some >> point it's likely to be sent out via the wire protocol; at that point >> we'll need a value of a certain width. Better to use that width right >> from the beginning. > Hmm, by that argument, we shouldn't ever use any integer type other than > int16, int32, and int64. > I'm thinking for example that pgbench makes a lot of use of int64 and > printing that out makes quite messy code. Replacing that by long long > int would make this much nicer and should be pretty harmless relative to > your concern. It does seem attractive to use long long in cases where we're not too fussed about the exact width. OTOH, that reasoning was exactly why we used "long" in a lot of places back in the day, and sure enough it came back to bite us. On the whole I think I could live with a policy that says "tuple counts shall be 'long long' when being passed around in code, but for persistent storage or wire-protocol transmission, use 'int64'". An alternative and much narrower policy is to say it's okay to do this with an int64 value: printf("processed %lld tuples", (long long) count); In such code, all we're assuming is long long >= 64 bits, which is completely safe per C99, and we dodge the need for a platform-varying format string. regards, tom lane
On 2019-05-23 16:34, Tom Lane wrote: > On the whole I think I could live with a policy that says "tuple counts > shall be 'long long' when being passed around in code, but for persistent > storage or wire-protocol transmission, use 'int64'". > > An alternative and much narrower policy is to say it's okay to do this > with an int64 value: > > printf("processed %lld tuples", (long long) count); > > In such code, all we're assuming is long long >= 64 bits, which > is completely safe per C99, and we dodge the need for a > platform-varying format string. Some combination of this seems quite reasonable. Attached is a patch to implement this in a handful of cases that are particularly verbose right now. I think those are easy wins. (Also a second patch that makes use of %zu for size_t where this was not yet done.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On Thu, May 23, 2019 at 10:21 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Hmm, by that argument, we shouldn't ever use any integer type other than > int16, int32, and int64. I think we basically shouldn't. I mean it's fine to use 'int' as a flags argument as part of an internal API, or as a loop counter private to a function or something. But if you are passing around values that involve on-disk compatibility or wire protocol compatibility, it's just a recipe for bugs. If the code has to sometimes cast a value to some other type, somebody may do it wrong. If there's a uniform rule that tuple counts are always int64, that's pretty easy to understand. In short, when a certain kind of value is widely-used, it should have a clearly-declared width. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > Attached is a patch to implement this in a handful of cases that are > particularly verbose right now. I think those are easy wins. > (Also a second patch that makes use of %zu for size_t where this was not > yet done.) I took a look through these and see nothing objectionable. There are probably more places that can be improved, but we need not insist on getting every such place in one go. Per Robert's position that variables ought to have well-defined widths, there might be something to be said for not touching the variable declarations that you changed from int64 to long long, and instead casting them to long long in the sprintf calls. But I'm not really convinced that that's better than what you've done. Marked CF entry as ready-for-committer. regards, tom lane
On 2019-07-02 22:56, Tom Lane wrote: > I took a look through these and see nothing objectionable. There are > probably more places that can be improved, but we need not insist on > getting every such place in one go. > > Per Robert's position that variables ought to have well-defined widths, > there might be something to be said for not touching the variable > declarations that you changed from int64 to long long, and instead > casting them to long long in the sprintf calls. But I'm not really > convinced that that's better than what you've done. > > Marked CF entry as ready-for-committer. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-05-22 16:40:28 +0200, Peter Eisentraut wrote: > On 2019-04-29 19:52, Andres Freund wrote: > > Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in > > translated strings. > > That won't work in non-GNU gettext. Which of those do we actually support? We already depend on bind_textdomain_codeset, which afaict wasn't present in older gettext implementations. - Andres
On 2019-08-14 20:17, Andres Freund wrote: > On 2019-05-22 16:40:28 +0200, Peter Eisentraut wrote: >> On 2019-04-29 19:52, Andres Freund wrote: >>> Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in >>> translated strings. >> >> That won't work in non-GNU gettext. > > Which of those do we actually support? We already depend on > bind_textdomain_codeset, which afaict wasn't present in older gettext > implementations. At least in theory we support Solaris gettext. In the past we occasionally got complaints when we broke something there. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services