Обсуждение: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
Hi, It looks like pg_dump --filter can mistakenly treat invalid object types in the filter file as valid ones. For example, the invalid type "table-data" (probably a typo for "table_data") is incorrectly recognized as "table", and pg_dump runs without error when it should fail. -------------------------------------------- $ cat filter.txt exclude table-data one $ pg_dump --filter filter.txt -- -- PostgreSQL database dump -- ... $ echo $? 0 -------------------------------------------- This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c) identifies tokens as sequences of ASCII alphabetic characters, treating non-alphabetic characters (like hyphens) as token boundaries. As a result, "table-data" is parsed as "table". To fix this, I've attached the patch that updates pg_dump --filter so that it treats tokens as strings of non-space characters separated by spaces or line endings, ensuring invalid types like "table-data" are correctly rejected. Thought? With the patch: -------------------------------------------- $ cat filter.txt exclude table-data one $ pg_dump --filter filter.txt pg_dump: error: invalid format in filter read from file "filter.txt" on line 1: unsupported filter object type: "table-data" -------------------------------------------- Regards, -- Fujii Masao
Вложения
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
От
Srinath Reddy Sadipiralla
Дата:
On Sat, Aug 2, 2025 at 3:18 PM Fujii Masao <masao.fujii@gmail.com> wrote:
Hi,
It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type "table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.
--------------------------------------------
$ cat filter.txt
exclude table-data one
$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...
$ echo $?
0
--------------------------------------------
This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".
To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?
With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one
$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------
Hi Fujii-san , +1 for the patch , I have reviewed and tested it and LGTM.
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
От
Xuneng Zhou
Дата:
Hi Fujii-san, Thanks for working on this. On Sat, Aug 2, 2025 at 5:48 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > Hi, > > It looks like pg_dump --filter can mistakenly treat invalid object types > in the filter file as valid ones. For example, the invalid type "table-data" > (probably a typo for "table_data") is incorrectly recognized as "table", > and pg_dump runs without error when it should fail. > > -------------------------------------------- > $ cat filter.txt > exclude table-data one > > $ pg_dump --filter filter.txt > -- > -- PostgreSQL database dump > -- > ... > > $ echo $? > 0 > -------------------------------------------- > > This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c) > identifies tokens as sequences of ASCII alphabetic characters, treating > non-alphabetic characters (like hyphens) as token boundaries. As a result, > "table-data" is parsed as "table". > > To fix this, I've attached the patch that updates pg_dump --filter so that > it treats tokens as strings of non-space characters separated by spaces > or line endings, ensuring invalid types like "table-data" are correctly > rejected. Thought? > > With the patch: > -------------------------------------------- > $ cat filter.txt > exclude table-data one > > $ pg_dump --filter filter.txt > pg_dump: error: invalid format in filter read from file "filter.txt" > on line 1: unsupported filter object type: "table-data" > -------------------------------------------- After testing, the patch LGTM. I noticed two very small possible nits: 1) Comment wording The loop now calls isspace((unsigned char)*ptr), so a token ends at any whitespace, not just at ASCII space (0x20). Could we revise the comment—from “strings of non-space characters bounded by space characters” to something like “strings of non-space characters bounded by whitespace” —to match the behavior? 2) Variable name const char *keyword = filter_get_token(&str, &size); keyword = filter_get_token(&str, &size); After the patch, filter_get_token() no longer returns a keyword (letters-only identifier); it now returns any non-whitespace token. Renaming the variable from keyword to token (or similar) might make the intent clearer.. Best, Xuneng
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
От
Fujii Masao
Дата:
On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > After testing, the patch LGTM. I noticed two very small possible nits: Thanks for the review! > 1) Comment wording > > The loop now calls isspace((unsigned char)*ptr), so a token ends at > any whitespace, not just at ASCII space (0x20). Could we revise the > comment—from > “strings of non-space characters bounded by space characters” > to something like > “strings of non-space characters bounded by whitespace” > —to match the behavior? I agree with the change. But the phrase "strings of non-space characters bounded by whitespace" is a bit redundant, and "strings of non-whitespace characters" is sufficient, isn't it? So I used that wording in the updated patch I've attached. > 2) Variable name > > const char *keyword = filter_get_token(&str, &size); > keyword = filter_get_token(&str, &size); > > After the patch, filter_get_token() no longer returns a keyword > (letters-only identifier); it now returns any non-whitespace token. > Renaming the variable from keyword to token (or similar) might make > the intent clearer.. This also got me thinking, if we simply define keywords as strings of non-whitespace characters, maybe we don't need to change the term "keyword" to "token" at all. I've updated the patch with that in mind. Thoughts? Regards, -- Fujii Masao
Вложения
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
От
Xuneng Zhou
Дата:
On Mon, Aug 4, 2025 at 11:18 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > After testing, the patch LGTM. I noticed two very small possible nits: > > Thanks for the review! > > > > 1) Comment wording > > > > The loop now calls isspace((unsigned char)*ptr), so a token ends at > > any whitespace, not just at ASCII space (0x20). Could we revise the > > comment—from > > “strings of non-space characters bounded by space characters” > > to something like > > “strings of non-space characters bounded by whitespace” > > —to match the behavior? > > I agree with the change. But the phrase "strings of non-space characters > bounded by whitespace" is a bit redundant, and "strings of non-whitespace > characters" is sufficient, isn't it? So I used that wording in the updated > patch I've attached. > > > > 2) Variable name > > > > const char *keyword = filter_get_token(&str, &size); > > keyword = filter_get_token(&str, &size); > > > > After the patch, filter_get_token() no longer returns a keyword > > (letters-only identifier); it now returns any non-whitespace token. > > Renaming the variable from keyword to token (or similar) might make > > the intent clearer.. > > This also got me thinking, if we simply define keywords as strings of > non-whitespace characters, maybe we don't need to change the term "keyword" > to "token" at all. I've updated the patch with that in mind. Thoughts? > +1, this looks more elegant to me. Best, Xuneng
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
От
Daniel Gustafsson
Дата:
> On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote: I missed this thread while being on vacation, thanks for finding and fixing this! > This also got me thinking, if we simply define keywords as strings of > non-whitespace characters, maybe we don't need to change the term "keyword" > to "token" at all. I've updated the patch with that in mind. Thoughts? Agreed, this should work fine, and it aligns the code somwhat with read_pattern which is a good thing. + * in line buffer. Returns NULL when the buffer is empty or no keyword exists. Since "is empty" could be interpreted as being a null pointer, maybe we should add a if (!*line) check (or an Assert) before we dereference the passed in buffer? -- Daniel Gustafsson
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
От
Fujii Masao
Дата:
On Tue, Aug 5, 2025 at 4:52 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote: > > I missed this thread while being on vacation, thanks for finding and fixing > this! > > > This also got me thinking, if we simply define keywords as strings of > > non-whitespace characters, maybe we don't need to change the term "keyword" > > to "token" at all. I've updated the patch with that in mind. Thoughts? > > Agreed, this should work fine, and it aligns the code somwhat with read_pattern > which is a good thing. > > + * in line buffer. Returns NULL when the buffer is empty or no keyword exists. > Since "is empty" could be interpreted as being a null pointer, maybe we should > add a if (!*line) check (or an Assert) before we dereference the passed in > buffer? Thanks for the review! I've added Assert(*line != NULL) at the start of filter_get_keyword(). Updated patch attached. Regards, -- Fujii Masao
Вложения
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
От
Daniel Gustafsson
Дата:
> On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote: > I've added Assert(*line != NULL) at the start of filter_get_keyword(). > Updated patch attached. LGTM. -- Daniel Gustafsson
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
От
Fujii Masao
Дата:
On Thu, Aug 7, 2025 at 9:17 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote: > > > I've added Assert(*line != NULL) at the start of filter_get_keyword(). > > Updated patch attached. > > LGTM. I've pushed the patch. Thanks! Regards, -- Fujii Masao