Обсуждение: Should we represent temp files as unsigned long int instead of signed long int type?

Поиск
Список
Период
Сортировка

Should we represent temp files as unsigned long int instead of signed long int type?

От
Ashutosh Sharma
Дата:
Hi All,

At present, we represent temp files as a signed long int number. And
depending on the system architecture (32 bit or 64 bit), the range of
signed long int varies, for example on a 32-bit system it will range
from -2,147,483,648 to 2,147,483,647. AFAIU, this will not allow a
session to create more than 2 billion temporary files and that is not
a small number at all, but still what if we make it an unsigned long
int which will allow a session to create 4 billion temporary files if
needed. I might be sounding a little stupid here because 2 billion
temporary files is like 2000 peta bytes (2 billion * 1GB), considering
each temp file is 1GB in size which is not a small data size at all,
it is a huge amount of data storage. However, since the variable we
use to name temporary files is a static long int (static long
tempFileCounter = 0;), there is a possibility that this number will
get exhausted soon if the same session is trying to create too many
temp files via multiple queries.

Just adding few lines of code related to this from postmaster.c:

/*
 * Number of temporary files opened during the current session;
 * this is used in generation of tempfile names.
 */
static long tempFileCounter = 0;

    /*
     * Generate a tempfile name that should be unique within the current
     * database instance.
     */
    snprintf(tempfilepath, sizeof(tempfilepath), "%s/%s%d.%ld",
             tempdirpath, PG_TEMP_FILE_PREFIX, MyProcPid, tempFileCounter++);

--
With Regards,
Ashutosh Sharma.



Re: Should we represent temp files as unsigned long int instead of signed long int type?

От
Tom Lane
Дата:
Ashutosh Sharma <ashu.coek88@gmail.com> writes:
> At present, we represent temp files as a signed long int number. And
> depending on the system architecture (32 bit or 64 bit), the range of
> signed long int varies, for example on a 32-bit system it will range
> from -2,147,483,648 to 2,147,483,647. AFAIU, this will not allow a
> session to create more than 2 billion temporary files and that is not
> a small number at all, but still what if we make it an unsigned long
> int which will allow a session to create 4 billion temporary files if
> needed.

AFAIK, nothing particularly awful will happen if that counter wraps
around.  Perhaps if you gamed the system really hard, you could cause
a collision with a still-extant temp file from the previous cycle,
but I seriously doubt that could happen by accident.  So I don't
think there's anything to worry about here.  Maybe we could make
that filename pattern %lu not %ld, but minus sign is a perfectly
acceptable filename character, so such a change would be cosmetic.

            regards, tom lane



Re: Should we represent temp files as unsigned long int instead of signed long int type?

От
Robert Haas
Дата:
On Wed, Oct 25, 2023 at 1:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> At present, we represent temp files as a signed long int number. And
> depending on the system architecture (32 bit or 64 bit), the range of
> signed long int varies, for example on a 32-bit system it will range
> from -2,147,483,648 to 2,147,483,647. AFAIU, this will not allow a
> session to create more than 2 billion temporary files and that is not
> a small number at all, but still what if we make it an unsigned long
> int which will allow a session to create 4 billion temporary files if
> needed. I might be sounding a little stupid here because 2 billion
> temporary files is like 2000 peta bytes (2 billion * 1GB), considering
> each temp file is 1GB in size which is not a small data size at all,
> it is a huge amount of data storage. However, since the variable we
> use to name temporary files is a static long int (static long
> tempFileCounter = 0;), there is a possibility that this number will
> get exhausted soon if the same session is trying to create too many
> temp files via multiple queries.

I think we use signed integer types in a bunch of places where an
unsigned integer type would be straight-up better, and this is one of
them.

I don't know whether it really matters, though.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Should we represent temp files as unsigned long int instead of signed long int type?

От
Michael Paquier
Дата:
On Wed, Oct 25, 2023 at 03:07:39PM -0400, Tom Lane wrote:
> AFAIK, nothing particularly awful will happen if that counter wraps
> around.  Perhaps if you gamed the system really hard, you could cause
> a collision with a still-extant temp file from the previous cycle,
> but I seriously doubt that could happen by accident.  So I don't
> think there's anything to worry about here.  Maybe we could make
> that filename pattern %lu not %ld, but minus sign is a perfectly
> acceptable filename character, so such a change would be cosmetic.

In the mood of removing long because it may be 4 bytes or 8 bytes
depending on the environment, I'd suggest to change it to either int64
or uint64.  Not that it matters much for this specific case, but that
makes the code more portable.
--
Michael

Вложения

Re: Should we represent temp files as unsigned long int instead of signed long int type?

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> In the mood of removing long because it may be 4 bytes or 8 bytes
> depending on the environment, I'd suggest to change it to either int64
> or uint64.  Not that it matters much for this specific case, but that
> makes the code more portable.

Then you're going to need a not-so-portable conversion spec in the
snprintf call.  Not sure it's any improvement.

            regards, tom lane