Обсуждение: Re: No error checking when reading from file using zstd in pg_dump

Поиск
Список
Период
Сортировка

Re: No error checking when reading from file using zstd in pg_dump

От
Evgeniy Gorbanev
Дата:
16.06.2025 14:25, Daniel Gustafsson пишет:
>> On 16 Jun 2025, at 10:14, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:
>> In src/bin/pg_dump/compress_zstd.c, the Zstd_read function always
>> returns true. But if you look at the Zstd_gets and Zstd_getc functions,
>> where Zstd_read is called via CFH->read_func, it is expected that
>> the Zstd_read function can also return false. In case of
>> a read-from-file error, the process is expected to terminate, but
>> pg_dump will continue the process.
>> I assume that after checking
>> if (cnt == 0)
>> should be
>> return false;
>       if (cnt == 0)
> -        break;
> +        return false;
>
> As cnt is coming from fread() returning false here would be wrong as you cannot
> from 0 alone know if it's EOF or an error.  Instead it needs to inspect the
> stream with feof() and ferror() to know how to proceed.
>
> --
> Daniel Gustafsson

The feof() check is in the calling functions, e.g. in the Zstd_getc
function.


Regards, Evgeniy




Re: No error checking when reading from file using zstd in pg_dump

От
Daniel Gustafsson
Дата:
> On 16 Jun 2025, at 10:52, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:
>
> 16.06.2025 14:25, Daniel Gustafsson пишет:
>>> On 16 Jun 2025, at 10:14, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:
>>> In src/bin/pg_dump/compress_zstd.c, the Zstd_read function always
>>> returns true. But if you look at the Zstd_gets and Zstd_getc functions,
>>> where Zstd_read is called via CFH->read_func, it is expected that
>>> the Zstd_read function can also return false. In case of
>>> a read-from-file error, the process is expected to terminate, but
>>> pg_dump will continue the process.
>>> I assume that after checking
>>> if (cnt == 0)
>>> should be
>>> return false;
>>   if (cnt == 0)
>> - break;
>> + return false;
>>
>> As cnt is coming from fread() returning false here would be wrong as you cannot
>> from 0 alone know if it's EOF or an error.  Instead it needs to inspect the
>> stream with feof() and ferror() to know how to proceed.
>
> The feof() check is in the calling functions, e.g. in the Zstd_getc
> function.

That function is using feof/ferror to log an appropriate error message, what
you are proposing is to consider all cases of EOF as an error.  If you test
that you will see lots of test starting to fail.

--
Daniel Gustafsson




Re: No error checking when reading from file using zstd in pg_dump

От
Daniel Gustafsson
Дата:
> On 16 Jun 2025, at 11:26, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:

> I ran tests for pg_dump and they all passed. Logs attached.

Files=7, Tests=11918, 35 wallclock secs ( 0.59 usr  0.07 sys +  7.92 cusr  4.32 csys = 12.90 CPU)

That seems like a low number of tests executed, with ZStd enabled I see over
13200 tests in the log.  Are you sure you are building with ZStd compression
enabled?

--
Daniel Gustafsson




Re: No error checking when reading from file using zstd in pg_dump

От
Evgeniy Gorbanev
Дата:
16.06.2025 15:43, Daniel Gustafsson пишет:
>> On 16 Jun 2025, at 11:26, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:
>> I ran tests for pg_dump and they all passed. Logs attached.
> Files=7, Tests=11918, 35 wallclock secs ( 0.59 usr  0.07 sys +  7.92 cusr  4.32 csys = 12.90 CPU)
>
> That seems like a low number of tests executed, with ZStd enabled I see over
> 13200 tests in the log.  Are you sure you are building with ZStd compression
> enabled?
>
> --
> Daniel Gustafsson
>

Yes, you are right. Now I see the errors.



Re: No error checking when reading from file using zstd in pg_dump

От
Tom Lane
Дата:
I think the real problem here is that what the code is doing is
precisely not what is documented in compress_io.h:

    /*
     * Read 'size' bytes of data from the file and store them into 'ptr'.
     * Optionally it will store the number of bytes read in 'rsize'.
     *
     * Returns true on success and throws an internal error otherwise.
     */
    bool        (*read_func) (void *ptr, size_t size, size_t *rsize,
                              CompressFileHandle *CFH);

I've not checked to see what the other users of this API do, but
if they're all like this then we need to fix that comment.
Otherwise we probably need to make them all alike; even if there's
not a bug today, one will soon appear from someone believing the
comment.

            regards, tom lane



Re: No error checking when reading from file using zstd in pg_dump

От
Daniel Gustafsson
Дата:
> On 16 Jun 2025, at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I've not checked to see what the other users of this API do, but
> if they're all like this then we need to fix that comment.

AFAICT all other callers of this API are throwing an error with pg_fatal, and
so does the function in question for ZStd decompression errors.  If we handle
the case of fread() returning 0 to indicate an error like the below *untested
sketch* (with a better error message) this function is fully API compliant as
well.

                /* If we have no more input to consume, we're done */
                if (cnt == 0)
+               {
+                       if (ferror(unconstify(void *, input->src)))
+                               pg_fatal("could not read data to decompress: %m");
+
                        break;
+               }

If this seems like a good approach then Zstd_getc can be simplified as well as
it no longer needs to call ferror, it still needs to check feof though.

--
Daniel Gustafsson




Re: No error checking when reading from file using zstd in pg_dump

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
>> On 16 Jun 2025, at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've not checked to see what the other users of this API do, but
>> if they're all like this then we need to fix that comment.

> AFAICT all other callers of this API are throwing an error with pg_fatal, and
> so does the function in question for ZStd decompression errors.

I think I agree that we need to handle the ferror() case inside the
read_func for consistency.  But there is another problem, which is
that neither of its callers are paying attention to the API: as
defined, the read_func can never return anything but "true",
which is how Zstd_read behaves.  But both the callers in
compress_zstd.c seem to think they should test its result to
detect EOF.  Apparently, our tests do not cover the case of an
unexpected EOF.

This API is actually quite bizarrely defined, if you ask me.
Returning the byte count is optional, but if you don't pay
attention to the byte count you cannot know if you got any
data.  At best, that's bug-encouraging.

            regards, tom lane



Re: No error checking when reading from file using zstd in pg_dump

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 16 Jun 2025, at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This API is actually quite bizarrely defined, if you ask me.
>> Returning the byte count is optional, but if you don't pay
>> attention to the byte count you cannot know if you got any
>> data.  At best, that's bug-encouraging.

> Agreed.  Given that many of implementations in the gzip support code directly
> return the gzXXX function I suspect it was modeled around GZip but thats an
> unresearched guess.  The fact that this returns Z_NULL where the API defines
> NULL is unlikely to ever be an issue, but it also doesn't seem entirely neat..

After looking around a bit more I realized that this API is a complete
disaster: it's not only bug-prone, but there are actual bugs in
multiple callers, eg _ReadBuf() totally fails to detect early EOF as
it intends to.  We need to fix it, not slap some band-aids on.
Draft patch attached.

The write_func side is a bit of a mess too.  I fixed some obvious API
violations in the attached, but I think we need to do more, because
it seems that zero thought was given to reporting errors sanely.
The callers all assume that strerror() is what to use to report an
incomplete write, but this is not appropriate in every case, for
instance we need to pay attention to gzerror() in gzip cases.
I'm inclined to think that the best answer is to move error reporting
into the write_funcs, and just make the API spec be "write or die".
I've not tackled that here though.

I was depressed to find that Zstd_getc reads one byte into
an otherwise-uninitialized int and then returns the whole int.
Even if we're lucky enough for the variable to start off zero,
this would fail utterly on big-endian machines.  The only
reason we've not noticed is that the regression tests do not
reach Zstd_getc, nor most of the other getc_func implementations.

Now I'm afraid to look at the rest of the compress_io.h API.
If it's as badly written as these parts, there are more bugs
to be found.

            regards, tom lane

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 5a30ebf9bf5..8db121b94a3 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -251,14 +251,14 @@ InitCompressorGzip(CompressorState *cs,
  *----------------------
  */

-static bool
-Gzip_read(void *ptr, size_t size, size_t *rsize, CompressFileHandle *CFH)
+static size_t
+Gzip_read(void *ptr, size_t size, CompressFileHandle *CFH)
 {
     gzFile        gzfp = (gzFile) CFH->private_data;
     int            gzret;

     gzret = gzread(gzfp, ptr, size);
-    if (gzret <= 0 && !gzeof(gzfp))
+    if (gzret < 0)
     {
         int            errnum;
         const char *errmsg = gzerror(gzfp, &errnum);
@@ -267,10 +267,7 @@ Gzip_read(void *ptr, size_t size, size_t *rsize, CompressFileHandle *CFH)
                  errnum == Z_ERRNO ? strerror(errno) : errmsg);
     }

-    if (rsize)
-        *rsize = (size_t) gzret;
-
-    return true;
+    return (size_t) gzret;
 }

 static bool
@@ -278,7 +275,7 @@ Gzip_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 {
     gzFile        gzfp = (gzFile) CFH->private_data;

-    return gzwrite(gzfp, ptr, size) > 0;
+    return gzwrite(gzfp, ptr, size) == size;
 }

 static int
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index db9b38744c8..dad0c91fec5 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -123,21 +123,23 @@ struct CompressFileHandle
                                     CompressFileHandle *CFH);

     /*
-     * Read 'size' bytes of data from the file and store them into 'ptr'.
-     * Optionally it will store the number of bytes read in 'rsize'.
+     * Read up to 'size' bytes of data from the file and store them into
+     * 'ptr'.
      *
-     * Returns true on success and throws an internal error otherwise.
+     * Returns number of bytes read (this might be less than 'size' if EOF was
+     * reached).  Throws error for error conditions other than EOF.
      */
-    bool        (*read_func) (void *ptr, size_t size, size_t *rsize,
+    size_t        (*read_func) (void *ptr, size_t size,
                               CompressFileHandle *CFH);

     /*
      * Write 'size' bytes of data into the file from 'ptr'.
      *
-     * Returns true on success and false on error.
+     * Returns true on success and false on error (it is caller's
+     * responsibility to deal with error conditions).
      */
     bool        (*write_func) (const void *ptr, size_t size,
-                               struct CompressFileHandle *CFH);
+                               CompressFileHandle *CFH);

     /*
      * Read at most size - 1 characters from the compress file handle into
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index e99f0cad71f..c94f9dcd4cf 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -583,6 +583,7 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
             return false;
         }

+        errno = 0;
         if (fwrite(state->buffer, 1, status, state->fp) != status)
         {
             errno = (errno) ? errno : ENOSPC;
@@ -598,8 +599,8 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 /*
  * fread() equivalent implementation for LZ4 compressed files.
  */
-static bool
-LZ4Stream_read(void *ptr, size_t size, size_t *rsize, CompressFileHandle *CFH)
+static size_t
+LZ4Stream_read(void *ptr, size_t size, CompressFileHandle *CFH)
 {
     LZ4State   *state = (LZ4State *) CFH->private_data;
     int            ret;
@@ -607,10 +608,7 @@ LZ4Stream_read(void *ptr, size_t size, size_t *rsize, CompressFileHandle *CFH)
     if ((ret = LZ4Stream_read_internal(state, ptr, size, false)) < 0)
         pg_fatal("could not read from input file: %s", LZ4Stream_get_error(CFH));

-    if (rsize)
-        *rsize = (size_t) ret;
-
-    return true;
+    return (size_t) ret;
 }

 /*
diff --git a/src/bin/pg_dump/compress_none.c b/src/bin/pg_dump/compress_none.c
index 3fc89c99854..4924935a5bd 100644
--- a/src/bin/pg_dump/compress_none.c
+++ b/src/bin/pg_dump/compress_none.c
@@ -83,23 +83,17 @@ InitCompressorNone(CompressorState *cs,
  * Private routines
  */

-static bool
-read_none(void *ptr, size_t size, size_t *rsize, CompressFileHandle *CFH)
+static size_t
+read_none(void *ptr, size_t size, CompressFileHandle *CFH)
 {
     FILE       *fp = (FILE *) CFH->private_data;
     size_t        ret;

-    if (size == 0)
-        return true;
-
     ret = fread(ptr, 1, size, fp);
-    if (ret != size && !feof(fp))
+    if (ferror(fp))
         pg_fatal("could not read from input file: %m");

-    if (rsize)
-        *rsize = ret;
-
-    return true;
+    return ret;
 }

 static bool
diff --git a/src/bin/pg_dump/compress_zstd.c b/src/bin/pg_dump/compress_zstd.c
index cb595b10c2d..8f718212d9f 100644
--- a/src/bin/pg_dump/compress_zstd.c
+++ b/src/bin/pg_dump/compress_zstd.c
@@ -258,8 +258,8 @@ InitCompressorZstd(CompressorState *cs,
  * Compressed stream API
  */

-static bool
-Zstd_read(void *ptr, size_t size, size_t *rdsize, CompressFileHandle *CFH)
+static size_t
+Zstd_read(void *ptr, size_t size, CompressFileHandle *CFH)
 {
     ZstdCompressorState *zstdcs = (ZstdCompressorState *) CFH->private_data;
     ZSTD_inBuffer *input = &zstdcs->input;
@@ -292,6 +292,9 @@ Zstd_read(void *ptr, size_t size, size_t *rdsize, CompressFileHandle *CFH)
         if (input->pos == input->size)
         {
             cnt = fread(unconstify(void *, input->src), 1, input_allocated_size, zstdcs->fp);
+            if (ferror(zstdcs->fp))
+                pg_fatal("could not read from input file: %m");
+
             input->size = cnt;

             Assert(cnt <= input_allocated_size);
@@ -320,10 +323,7 @@ Zstd_read(void *ptr, size_t size, size_t *rdsize, CompressFileHandle *CFH)
             break;                /* We read all the data that fits */
     }

-    if (rdsize != NULL)
-        *rdsize = output->pos;
-
-    return true;
+    return output->pos;
 }

 static bool
@@ -358,22 +358,16 @@ Zstd_write(const void *ptr, size_t size, CompressFileHandle *CFH)
         }
     }

-    return size;
+    return true;
 }

 static int
 Zstd_getc(CompressFileHandle *CFH)
 {
-    ZstdCompressorState *zstdcs = (ZstdCompressorState *) CFH->private_data;
-    int            ret;
+    unsigned char ret;

-    if (CFH->read_func(&ret, 1, NULL, CFH) != 1)
-    {
-        if (feof(zstdcs->fp))
-            pg_fatal("could not read from input file: end of file");
-        else
-            pg_fatal("could not read from input file: %m");
-    }
+    if (CFH->read_func(&ret, 1, CFH) != 1)
+        pg_fatal("could not read from input file: end of file");
     return ret;
 }

@@ -390,11 +384,7 @@ Zstd_gets(char *buf, int len, CompressFileHandle *CFH)
      */
     for (i = 0; i < len - 1; ++i)
     {
-        size_t        readsz;
-
-        if (!CFH->read_func(&buf[i], 1, &readsz, CFH))
-            break;
-        if (readsz != 1)
+        if (CFH->read_func(&buf[i], 1, CFH) != 1)
             break;
         if (buf[i] == '\n')
         {
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index bc2a2fb4797..eb2ce8d5a6c 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -351,7 +351,7 @@ _EndData(ArchiveHandle *AH, TocEntry *te)
 static void
 _PrintFileData(ArchiveHandle *AH, char *filename)
 {
-    size_t        cnt = 0;
+    size_t        cnt;
     char       *buf;
     size_t        buflen;
     CompressFileHandle *CFH;
@@ -366,7 +366,7 @@ _PrintFileData(ArchiveHandle *AH, char *filename)
     buflen = DEFAULT_IO_BUFFER_SIZE;
     buf = pg_malloc(buflen);

-    while (CFH->read_func(buf, buflen, &cnt, CFH) && cnt > 0)
+    while ((cnt = CFH->read_func(buf, buflen, CFH)) > 0)
     {
         ahwrite(buf, 1, cnt, AH);
     }
@@ -531,10 +531,10 @@ _ReadBuf(ArchiveHandle *AH, void *buf, size_t len)
     CompressFileHandle *CFH = ctx->dataFH;

     /*
-     * If there was an I/O error, we already exited in readF(), so here we
-     * exit on short reads.
+     * We do not expect a short read, so fail if we get one.  The read_func
+     * already dealt with any outright I/O error.
      */
-    if (!CFH->read_func(buf, len, NULL, CFH))
+    if (CFH->read_func(buf, len, CFH) != len)
         pg_fatal("could not read from input file: end of file");
 }


Re: No error checking when reading from file using zstd in pg_dump

От
Tomas Vondra
Дата:
On 6/16/25 17:41, Daniel Gustafsson wrote:
>> On 16 Jun 2025, at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Daniel Gustafsson <daniel@yesql.se> writes:
>>>> On 16 Jun 2025, at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> I've not checked to see what the other users of this API do, but
>>>> if they're all like this then we need to fix that comment.
>>
>>> AFAICT all other callers of this API are throwing an error with pg_fatal, and
>>> so does the function in question for ZStd decompression errors.
>>
>> I think I agree that we need to handle the ferror() case inside the
>> read_func for consistency.  But there is another problem, which is
>> that neither of its callers are paying attention to the API: as
>> defined, the read_func can never return anything but "true",
>> which is how Zstd_read behaves.  But both the callers in
>> compress_zstd.c seem to think they should test its result to
>> detect EOF.  
> 
> Attached is a stab at fixing both the error handling in read_func as well as
> the callers misuse of the API.
> 
>> Apparently, our tests do not cover the case of an
>> unexpected EOF.
> 
> I admittedly ran out of steam when looking at adding something like this to our
> pg_dump tests.
> 
>> This API is actually quite bizarrely defined, if you ask me.
>> Returning the byte count is optional, but if you don't pay
>> attention to the byte count you cannot know if you got any
>> data.  At best, that's bug-encouraging.
> 
> Agreed.  Given that many of implementations in the gzip support code directly
> return the gzXXX function I suspect it was modeled around GZip but thats an
> unresearched guess.  The fact that this returns Z_NULL where the API defines
> NULL is unlikely to ever be an issue, but it also doesn't seem entirely neat..
> 

Yes. It's been a while since this commit, but I recall we started with a
gzip-only implementation. 16 introduced this API, but it's definitely
the case it was based on the initial gzip code.

Regarding the Z_NULL, I believe it has always been ignored like this, at
least since 9.1. The code simply returns what gzgets() returns, and then
compares that to NULL, etc. Is there there's a better way to deal with
Z_NULL? I suppose we could explicitly check/translate Z_NULL to NULL,
although Z_NULL is simply defined as 0. I don't recall if NULL has some
additional magic.

regards

-- 
Tomas Vondra




Re: No error checking when reading from file using zstd in pg_dump

От
Tomas Vondra
Дата:
On 6/16/25 19:56, Tom Lane wrote:
> ...
> 
> After looking around a bit more I realized that this API is a complete
> disaster: it's not only bug-prone, but there are actual bugs in
> multiple callers, eg _ReadBuf() totally fails to detect early EOF as
> it intends to.  We need to fix it, not slap some band-aids on.
> Draft patch attached.
> 
> The write_func side is a bit of a mess too.  I fixed some obvious API
> violations in the attached, but I think we need to do more, because
> it seems that zero thought was given to reporting errors sanely.
> The callers all assume that strerror() is what to use to report an
> incomplete write, but this is not appropriate in every case, for
> instance we need to pay attention to gzerror() in gzip cases.
> I'm inclined to think that the best answer is to move error reporting
> into the write_funcs, and just make the API spec be "write or die".
> I've not tackled that here though.
> 
> I was depressed to find that Zstd_getc reads one byte into
> an otherwise-uninitialized int and then returns the whole int.
> Even if we're lucky enough for the variable to start off zero,
> this would fail utterly on big-endian machines.  The only
> reason we've not noticed is that the regression tests do not
> reach Zstd_getc, nor most of the other getc_func implementations.
> 
> Now I'm afraid to look at the rest of the compress_io.h API.
> If it's as badly written as these parts, there are more bugs
> to be found.
> 

Well, that doesn't sound great :-( Most of this is likely my fault, as
the one who pushed e9960732a961 (and some commits that built on it).
Sorry about that. I'll take a fresh look at the commits this week, but
considering I missed the issues before commit ...

For a moment I was worried about breaking ABI when fixing this in the
backbranches, but I guess that's not an issue for tools like pg_dump.


regards

-- 
Tomas Vondra




Re: No error checking when reading from file using zstd in pg_dump

От
Tom Lane
Дата:
Tomas Vondra <tomas@vondra.me> writes:
> For a moment I was worried about breaking ABI when fixing this in the
> backbranches, but I guess that's not an issue for tools like pg_dump.

Yeah, I think it'd be okay to change compress_io.h APIs in the back
branches; it's hard to see how anything outside pg_dump+pg_restore
would be depending on that.

            regards, tom lane



Re: No error checking when reading from file using zstd in pg_dump

От
Daniel Gustafsson
Дата:
> On 16 Jun 2025, at 21:45, Tomas Vondra <tomas@vondra.me> wrote:

> Regarding the Z_NULL, I believe it has always been ignored like this, at
> least since 9.1. The code simply returns what gzgets() returns, and then
> compares that to NULL, etc. Is there there's a better way to deal with
> Z_NULL? I suppose we could explicitly check/translate Z_NULL to NULL,
> although Z_NULL is simply defined as 0. I don't recall if NULL has some
> additional magic.

Right, to be clear, I don't think there is a bug here (or the risk one going
forward).  It's just my own preference of not mixing API concepts

--
Daniel Gustafsson




Re: No error checking when reading from file using zstd in pg_dump

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> I spent a little bit of time reading over all the implementations and cross
> referencing the API for conformity, and came up with the attached.  The 0001
> patch is the one from upstream, and each subsequent commit is fixing one
> function for all the implementations.  Before pushing it should all be squashed
> into a single commit IMHO.

Thanks for tackling this!

I looked over this patchset briefly, and found a couple of nits:

v5-0002, in compress_io.h:

+     * Returns true on success and throws error for all error conditions.

It doesn't return true anymore.  Should be more like

+     * Returns nothing.  Exits via pg_fatal for all error conditions.

In LZ4Stream_write: you dropped the bit about

-            errno = (errno) ? errno : ENOSPC;

but I think that's still necessary: we must assume ENOSPC if fwrite
doesn't set errno.  Other fwrite callers (write_none, Zstd_write) need
this too.  v5-0004 has an instance too, in Zstd_close.  I did not check
to see if other fwrite calls are OK, but it'd be good to verify
that they all follow the pattern of presetting errno to 0 and then
replacing that with ENOSPC.

            regards, tom lane



Re: No error checking when reading from file using zstd in pg_dump

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 25 Jun 2025, at 17:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It doesn't return true anymore.  Should be more like
>> + * Returns nothing.  Exits via pg_fatal for all error conditions.

> Instead of this I changed the write_func signature to return the number of
> bytes written as size_t.  The API is loosely modelled around the libc stream API so
> going to void seemed less justifiable than size_t, even if the actual
> returnvalue is fairly useless due to erroring out via pg_fatal.

Hm.  My one concern about that is that using "void" ensures that the
compiler will catch any write_funcs or callsites that we missed
updating, whereas replacing bool with size_t probably prevents that
detection.

Of course this is now moot for the in-core code since we presumably
caught everything already.  But I wonder about patches (say to
support an additional compression method) that might be in the
pipeline, or in use in some local fork somewhere.  There's no
certainty that they'd get the word, especially since any tests
that didn't exercise failure conditions would still work.

So on the whole I prefer the "void" approach.  I'm not dead
set on that though, it's just a niggling worry.

            regards, tom lane



Re: No error checking when reading from file using zstd in pg_dump

От
Daniel Gustafsson
Дата:
> On 26 Jun 2025, at 20:01, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 26 Jun 2025, at 15:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> So on the whole I prefer the "void" approach.  I'm not dead
>> set on that though, it's just a niggling worry.
>
> I think the likelyhood of it being a problem in practice is pretty slim, but
> it's still a stronger argument than my "match an API we're still not aligned
> with". The attached v7 reverts back to void return.

In preparing for concluding this I've attached a v8 which is the patchset in v7
squashed into a single commit with an attempt at a commit message.

This version has been tested against v17 and v16 where it applies and passes
all tests (the latter isn't as assuring as it should be since there is a lack
of testcoverage).

--
Daniel Gustafsson


Вложения

Re: No error checking when reading from file using zstd in pg_dump

От
Tomas Vondra
Дата:

On 7/1/25 16:24, Daniel Gustafsson wrote:
>> On 26 Jun 2025, at 20:01, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>>> On 26 Jun 2025, at 15:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>>> So on the whole I prefer the "void" approach.  I'm not dead
>>> set on that though, it's just a niggling worry.
>>
>> I think the likelyhood of it being a problem in practice is pretty slim, but
>> it's still a stronger argument than my "match an API we're still not aligned
>> with". The attached v7 reverts back to void return.
> 
> In preparing for concluding this I've attached a v8 which is the patchset in v7
> squashed into a single commit with an attempt at a commit message.
> 

Thanks!

> This version has been tested against v17 and v16 where it applies and passes
> all tests (the latter isn't as assuring as it should be since there is a lack
> of testcoverage).
> 

Could you elaborate what you mean by lack of test coverage? Doesn't
pg_dump have TAP tests exercising all compression methods? Perhaps it
does not exercise all parts of the code, and we could improve that?

regards

-- 
Tomas Vondra




Re: No error checking when reading from file using zstd in pg_dump

От
Daniel Gustafsson
Дата:
> On 1 Jul 2025, at 17:11, Tomas Vondra <tomas@vondra.me> wrote:
> On 7/1/25 16:24, Daniel Gustafsson wrote:

>> This version has been tested against v17 and v16 where it applies and passes
>> all tests (the latter isn't as assuring as it should be since there is a lack
>> of testcoverage).
>
> Could you elaborate what you mean by lack of test coverage? Doesn't
> pg_dump have TAP tests exercising all compression methods? Perhaps it
> does not exercise all parts of the code, and we could improve that?

Sorry, I was unclear.  There are indeed lots of pg_dump tests and all
compression methods are tested (the 0% for Zstd on coverage.pg.org is due to
that instance not being compiled with zstd support) but there are functions
which evade testing like getc_func.  It's also quite hard to test all the error
paths as that would require some sort of fault injection.

I don't think the current coverage is cause for holding back this patch, but
there is room for improvement to made in the coming cycle.

--
Daniel Gustafsson




Re: No error checking when reading from file using zstd in pg_dump

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> In preparing for concluding this I've attached a v8 which is the patchset in v7
> squashed into a single commit with an attempt at a commit message.

Glancing through this, I observe a couple of minor typos:

+     * Returns number of bytes read (this might be less than 'size' if EOF was
+     * reached).  Exits via pg_fatal all for error conditions.

s/all for/for all/

+        pg_fatal("coud not write to file: %m");

s/coud/could/

There are some minor typos in the proposed commit message, too.

            regards, tom lane



Re: No error checking when reading from file using zstd in pg_dump

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
>> On 1 Jul 2025, at 17:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There are some minor typos in the proposed commit message, too.

> It seems that my grasp of the english language went on holiday before I did.
> The attached v9 is hopefully less terrible.

This thread seems to have faded away, which I fear is my fault.
I took another look through v9, and it seems fine.  Do you want
to deal with pushing it, or shall I?

            regards, tom lane



Re: No error checking when reading from file using zstd in pg_dump

От
Daniel Gustafsson
Дата:
> On 24 Aug 2025, at 23:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>>> On 1 Jul 2025, at 17:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> There are some minor typos in the proposed commit message, too.
>
>> It seems that my grasp of the english language went on holiday before I did.
>> The attached v9 is hopefully less terrible.
>
> This thread seems to have faded away, which I fear is my fault.
> I took another look through v9, and it seems fine.  Do you want
> to deal with pushing it, or shall I?

Thanks for the review, this one had admittedly slipped my mind.  I'll work on
getting it in at the start of the week.

--
Daniel Gustafsson




Re: No error checking when reading from file using zstd in pg_dump

От
Daniel Gustafsson
Дата:
> On 24 Aug 2025, at 23:23, Daniel Gustafsson <daniel@yesql.se> wrote:

> I'll work on getting it in at the start of the week.

While not exactly the start of the week, this is now done.

--
Daniel Gustafsson