Обсуждение: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

Поиск
Список
Период
Сортировка

Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

От
Fujii Masao
Дата:
Hi,

It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type "table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.

--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...

$ echo $?
0
--------------------------------------------

This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".

To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?

With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------

Regards,

-- 
Fujii Masao

Вложения

Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

От
Srinath Reddy Sadipiralla
Дата:


On Sat, Aug 2, 2025 at 3:18 PM Fujii Masao <masao.fujii@gmail.com> wrote:
Hi,

It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type "table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.

--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...

$ echo $?
0
--------------------------------------------

This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".

To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?

With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------


Hi Fujii-san , +1 for the patch , I have reviewed and tested it and LGTM.

--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

От
Xuneng Zhou
Дата:
Hi Fujii-san,

Thanks for working on this.

On Sat, Aug 2, 2025 at 5:48 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Hi,
>
> It looks like pg_dump --filter can mistakenly treat invalid object types
> in the filter file as valid ones. For example, the invalid type "table-data"
> (probably a typo for "table_data") is incorrectly recognized as "table",
> and pg_dump runs without error when it should fail.
>
> --------------------------------------------
> $ cat filter.txt
> exclude table-data one
>
> $ pg_dump --filter filter.txt
> --
> -- PostgreSQL database dump
> --
> ...
>
> $ echo $?
> 0
> --------------------------------------------
>
> This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
> identifies tokens as sequences of ASCII alphabetic characters, treating
> non-alphabetic characters (like hyphens) as token boundaries. As a result,
> "table-data" is parsed as "table".
>
> To fix this, I've attached the patch that updates pg_dump --filter so that
> it treats tokens as strings of non-space characters separated by spaces
> or line endings, ensuring invalid types like "table-data" are correctly
> rejected. Thought?
>
> With the patch:
> --------------------------------------------
> $ cat filter.txt
> exclude table-data one
>
> $ pg_dump --filter filter.txt
> pg_dump: error: invalid format in filter read from file "filter.txt"
> on line 1: unsupported filter object type: "table-data"
> --------------------------------------------

After testing, the patch LGTM. I noticed two very small possible nits:

1) Comment wording

The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?

2) Variable name

const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);

After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..

Best,
Xuneng



Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

От
Fujii Masao
Дата:
On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> After testing, the patch LGTM. I noticed two very small possible nits:

Thanks for the review!


> 1) Comment wording
>
> The loop now calls isspace((unsigned char)*ptr), so a token ends at
> any whitespace, not just at ASCII space (0x20). Could we revise the
> comment—from
> “strings of non-space characters bounded by space characters”
> to something like
> “strings of non-space characters bounded by whitespace”
> —to match the behavior?

I agree with the change. But the phrase "strings of non-space characters
bounded by whitespace" is a bit redundant, and "strings of non-whitespace
characters" is sufficient, isn't it? So I used that wording in the updated
patch I've attached.


> 2) Variable name
>
> const char *keyword = filter_get_token(&str, &size);
> keyword = filter_get_token(&str, &size);
>
> After the patch, filter_get_token() no longer returns a keyword
> (letters-only identifier); it now returns any non-whitespace token.
> Renaming the variable from keyword to token (or similar) might make
> the intent clearer..

This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?

Regards,

--
Fujii Masao

Вложения

Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

От
Xuneng Zhou
Дата:
On Mon, Aug 4, 2025 at 11:18 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > After testing, the patch LGTM. I noticed two very small possible nits:
>
> Thanks for the review!
>
>
> > 1) Comment wording
> >
> > The loop now calls isspace((unsigned char)*ptr), so a token ends at
> > any whitespace, not just at ASCII space (0x20). Could we revise the
> > comment—from
> > “strings of non-space characters bounded by space characters”
> > to something like
> > “strings of non-space characters bounded by whitespace”
> > —to match the behavior?
>
> I agree with the change. But the phrase "strings of non-space characters
> bounded by whitespace" is a bit redundant, and "strings of non-whitespace
> characters" is sufficient, isn't it? So I used that wording in the updated
> patch I've attached.
>
>
> > 2) Variable name
> >
> > const char *keyword = filter_get_token(&str, &size);
> > keyword = filter_get_token(&str, &size);
> >
> > After the patch, filter_get_token() no longer returns a keyword
> > (letters-only identifier); it now returns any non-whitespace token.
> > Renaming the variable from keyword to token (or similar) might make
> > the intent clearer..
>
> This also got me thinking, if we simply define keywords as strings of
> non-whitespace characters, maybe we don't need to change the term "keyword"
> to "token" at all. I've updated the patch with that in mind. Thoughts?
>

+1, this looks more elegant to me.

Best,
Xuneng



Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

От
Daniel Gustafsson
Дата:
> On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:

I missed this thread while being on vacation, thanks for finding and fixing
this!

> This also got me thinking, if we simply define keywords as strings of
> non-whitespace characters, maybe we don't need to change the term "keyword"
> to "token" at all. I've updated the patch with that in mind. Thoughts?

Agreed, this should work fine, and it aligns the code somwhat with read_pattern
which is a good thing.

+ * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
Since "is empty" could be interpreted as being a null pointer, maybe we should
add a if (!*line) check (or an Assert) before we dereference the passed in
buffer?

--
Daniel Gustafsson




Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

От
Fujii Masao
Дата:
On Tue, Aug 5, 2025 at 4:52 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> I missed this thread while being on vacation, thanks for finding and fixing
> this!
>
> > This also got me thinking, if we simply define keywords as strings of
> > non-whitespace characters, maybe we don't need to change the term "keyword"
> > to "token" at all. I've updated the patch with that in mind. Thoughts?
>
> Agreed, this should work fine, and it aligns the code somwhat with read_pattern
> which is a good thing.
>
> + * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
> Since "is empty" could be interpreted as being a null pointer, maybe we should
> add a if (!*line) check (or an Assert) before we dereference the passed in
> buffer?

Thanks for the review!

I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.

Regards,

--
Fujii Masao

Вложения

Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

От
Daniel Gustafsson
Дата:
> On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote:

> I've added Assert(*line != NULL) at the start of filter_get_keyword().
> Updated patch attached.

LGTM.

--
Daniel Gustafsson




Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

От
Fujii Masao
Дата:
On Thu, Aug 7, 2025 at 9:17 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> > I've added Assert(*line != NULL) at the start of filter_get_keyword().
> > Updated patch attached.
>
> LGTM.

I've pushed the patch. Thanks!

Regards,

--
Fujii Masao