Re: pread, pwrite, etc return ssize_t not int

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: pread, pwrite, etc return ssize_t not int
Дата
Msg-id CA+hUKG+Y1xfDjaXmH5g07VitPN+-Wfp3P9RumNgD6jJXzm5Hrg@mail.gmail.com
обсуждение исходный текст
Ответ на pread, pwrite, etc return ssize_t not int  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pread, pwrite, etc return ssize_t not int  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
On Mon, Dec 25, 2023 at 7:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Coverity whinged this morning about the following bit in
> the new pg_combinebackup code:
>
> 644                 unsigned    rb;
> 645
> 646                 /* Read the block from the correct source, except if dry-run. */
> 647                 rb = pg_pread(s->fd, buffer, BLCKSZ, offsetmap[i]);
> 648                 if (rb != BLCKSZ)
> 649                 {
> >>>     CID 1559912:  Control flow issues  (NO_EFFECT)
> >>>     This less-than-zero comparison of an unsigned value is never true. "rb < 0U".
> 650                     if (rb < 0)
> 651                         pg_fatal("could not read file \"%s\": %m", s->filename);
>
> It's dead right to complain of course.  (I kind of think that the
> majority of places where reconstruct.c is using "unsigned" variables
> are poorly-thought-through; many of them look like they should be
> size_t, and I suspect some other ones beside this one are flat wrong
> or at least unnecessarily fragile.  But I digress.)

Yeah.

> While looking around for other places that might've made comparable
> mistakes, I noted that we have places that are storing the result of
> pg_pread[v]/pg_pwrite[v] into an "int" variable even though they are
> passing a size_t count argument that there is no obvious reason to
> believe must fit in int.  This seems like trouble waiting to happen,
> so I fixed some of these in the attached.  The major remaining place
> that I think we ought to change is the newly-minted
> FileRead[V]/FileWrite[V] functions, which are declared to return int
> but really should be returning ssize_t IMO.  I didn't do that here
> though.

Agreed in theory.  Note that we've only been using size_t in fd.c
functions since:

commit 2d4f1ba6cfc2f0a977f1c30bda9848041343e248
Author: Peter Eisentraut <peter@eisentraut.org>
Date:   Thu Dec 8 08:51:38 2022 +0100

    Update types in File API

    Make the argument types of the File API match stdio better:

    - Change the data buffer to void *, from char *.
    - Change FileWrite() data buffer to const on top of that.
    - Change amounts to size_t, from int.

I guess it was an oversight not to change the return type to match at
the same time.  That said, I think that would only be for tidiness.

Some assorted observations:

1.  We don't yet require "large file" support, meaning that we use
off_t our fd.c and lseek()/p*() replacement functions, but we know it
is only 32 bits on Windows, and we avoid creating large files.  I
think that means that a hypothetical very large write would break that
assumption, creating data whose position cannot be named in those
calls.  We could fix that on Windows by adjusting our wrappers, either
to work with pgoff_t instead off off_t (yuck) or redefining off_t
(yuck), but given that both options are gross, so far we have imagined
that we should move towards using large files only conditionally, on
sizeof(off_t) >= 8.

2.  Windows' native read() and write() functions still have prototypes
like 1988 POSIX, eg int read(int filedes, void *buf, unsigned int
nbyte).  It was 2001 POSIX that changed them to ssize_t read(int
filedesc, void *buf, size_t nbytes).  I doubt there is much that can
be done about that, except perhaps adding a wrapper that caps it.
Seems like overkill for a hypothetical sort of a problem...

3.  Windows' native ReadFile() and WriteFile() functions, which our
'positionified' pg_p*() wrapper functions use, work in terms of DWORD
= unsigned long which is 32 bit on that cursed ABI.  Our wrappers
should probably cap.

I think a number of Unixoid systems implemented the POSIX interface
change by capping internally, which doesn't matter much in practice
because no one really tries to transfer gigabytes at once, and any
non-trivial transfer size probably requires handling short transfers.
For example, man read on Linux:

       On Linux, read() (and similar system calls) will transfer at most
       0x7ffff000 (2,147,479,552) bytes, returning the number of bytes
       actually transferred.  (This is true on both 32-bit and 64-bit
       systems.)

> We could go further by insisting that *all* uses of pg_pread/pg_pwrite
> use ssize_t result variables.  I think that's probably overkill --- in
> the example above, which is only asking to write BLCKSZ worth of data,
> surely an int is sufficient.  But you could argue that allowing this
> pattern at all creates risk of copy/paste errors.

Yeah.

> Of course the real elephant in the room is that plain old read(2)
> and write(2) also return ssize_t.  I've not attempted to vet every
> call of those, and I think it'd likely be a waste of effort, as
> we're unlikely to ever try to shove more than INT_MAX worth of
> data through them.  But it's a bit harder to make that argument
> for the iovec-based file APIs.  I think we ought to try to keep
> our uses of those functions clean on this point.

Yeah I think it's OK for a caller that knows it's passing in an int
value to (implicitly) cast the return to int.  But it'd be nice to
make our I/O functions look and feel like standard functions and
return ssize_t.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Show WAL write and fsync stats in pg_stat_io
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Statistics Import and Export