Обсуждение: Tighten up a few overly lax regexes in pg_dump's tap tests
There are a few regexes in pg_dump's tap tests that are neglecting to escape the dot in "schema.table" type expressions. This could result in the test passing when it shouldn't. It seems worth putting that right. Patch attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
> On 4 Feb 2019, at 11:54, David Rowley <david.rowley@2ndquadrant.com> wrote: > > There are a few regexes in pg_dump's tap tests that are neglecting to > escape the dot in "schema.table" type expressions. This could result > in the test passing when it shouldn't. It seems worth putting that > right. +1 for tightening it up, and the patch looks good to me. We may also want to use the + metacharacter instead of * in a few places, since the intent is to always match something, where matching nothing should be considered an error: - qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m, + qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m, cheers ./daniel
On Mon, Feb 04, 2019 at 01:12:48PM +0100, Daniel Gustafsson wrote: > +1 for tightening it up, and the patch looks good to me. > > We may also want to use the + metacharacter instead of * in a few places, since > the intent is to always match something, where matching nothing should be > considered an error: > > - qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m, > + qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m, Some tests are missing the update, and it seems to me that tightening things up is a good thing, still we ought to do it consistently. Some places missing the update: - ALTER OPERATOR FAMILY - ALTER OPERATOR CLASS - ALTER SEQUENCE - ALTER TABLE (ONLY, partitioned table) - BLOB load - COMMENT ON - COPY - INSERT INTO - CREATE COLLATION test_pg_dump's 001_base.pl needs also a refresh. -- Michael
Вложения
On Tue, 5 Feb 2019 at 13:15, Michael Paquier <michael@paquier.xyz> wrote: > Some tests are missing the update, and it seems to me that > tightening things up is a good thing, still we ought to do it > consistently. Some places missing the update: > - ALTER OPERATOR FAMILY > - ALTER OPERATOR CLASS > - ALTER SEQUENCE > - ALTER TABLE (ONLY, partitioned table) > - BLOB load > - COMMENT ON > - COPY > - INSERT INTO > - CREATE COLLATION I think I've done all the ones that use \E. Those \Q ones don't need any escaping. I assume that's the ones you've found. I didn't do an exhaustive check though. > test_pg_dump's 001_base.pl needs also a refresh. Yeah, I thought about looking, but I thought it might be a bottomless pit. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Feb 05, 2019 at 01:50:50PM +1300, David Rowley wrote: > I think I've done all the ones that use \E. Those \Q ones don't need > any escaping. I assume that's the ones you've found. I didn't do an > exhaustive check though. Oh, good point. I didn't know that anything between \Q and \E are interpreted as literal characters. Instead of the approach you are proposing, perhaps it would make sense to extend this way of doing things then? For example some tests with CREATE CONVERSION do so. It looks much more portable than having to escape every dot. >> test_pg_dump's 001_base.pl needs also a refresh. > > Yeah, I thought about looking, but I thought it might be a bottomless pit. I just double-checked this one, and the regex patterns there look all correct. -- Michael
Вложения
On Tue, 5 Feb 2019 at 14:41, Michael Paquier <michael@paquier.xyz> wrote: > Instead of the approach you are > proposing, perhaps it would make sense to extend this way of doing > things then? For example some tests with CREATE CONVERSION do so. It > looks much more portable than having to escape every dot. I'm not particularly excited either way, but here's a patch with it that way. I did leave a couple untouched as there was quite a bit of escaping going on already. I didn't think switching between \Q and \E would have made those ones any more pleasing to the eye. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Tue, 5 Feb 2019 at 01:12, Daniel Gustafsson <daniel@yesql.se> wrote: > We may also want to use the + metacharacter instead of * in a few places, since > the intent is to always match something, where matching nothing should be > considered an error: > > - qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m, > + qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m, I looked for instances of * alone and didn't see any. I only saw ones prefixed with ".", in which case, isn't that matching 1 or more chars already? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Monday, February 4, 2019, David Rowley <david.rowley@2ndquadrant.com> wrote:
On Tue, 5 Feb 2019 at 01:12, Daniel Gustafsson <daniel@yesql.se> wrote:
> We may also want to use the + metacharacter instead of * in a few places, since
> the intent is to always match something, where matching nothing should be
> considered an error:
>
> - qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m,
> + qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,
I looked for instances of * alone and didn't see any. I only saw ones
prefixed with ".", in which case, isn't that matching 1 or more chars
already?
No. In Regex the following are equivalent:
.* == .{0,}
.+ == .{1,}
. == .{1}
A “*” by itself would either be an error or, assuming the preceding character is a space (so it visually looks alone) would be zero or more consecutive spaces.
In the above “...OWNER TO<space>;” is a valid match.
David J.
On Tue, Feb 05, 2019 at 05:46:55PM +1300, David Rowley wrote: > I did leave a couple untouched as there was quite a bit of escaping > going on already. I didn't think switching between \Q and \E would > have made those ones any more pleasing to the eye. - qr/^ALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m, + qr/^\QALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m, I would have just appended the \E for consistency at the end of the strings. Except that it looks fine. No need to send an updated version, it seems that you have all the spots. I'll do an extra pass on it tomorrow and see if I can commit it. -- Michael
Вложения
> On 5 Feb 2019, at 06:55, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Monday, February 4, 2019, David Rowley <david.rowley@2ndquadrant.com <mailto:david.rowley@2ndquadrant.com>> wrote: > On Tue, 5 Feb 2019 at 01:12, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote: > > We may also want to use the + metacharacter instead of * in a few places, since > > the intent is to always match something, where matching nothing should be > > considered an error: > > > > - qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m, > > + qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m, > > I looked for instances of * alone and didn't see any. I only saw ones > prefixed with ".", in which case, isn't that matching 1 or more chars > already? > > No. In Regex the following are equivalent: > > .* == .{0,} > .+ == .{1,} > . == .{1} > > A “*” by itself would either be an error or, assuming the preceding character is a space (so it visually looks alone) wouldbe zero or more consecutive spaces. Sorry for being a bit unclear in my original email, it’s as David says above: .* matches zero or more characters and .+ matches 1 or more characters. > In the above “...OWNER TO<space>;” is a valid match. Indeed, so we should move to matching with .+ to force an owner. cheers ./daniel
On Tue, Feb 05, 2019 at 04:26:18PM +0900, Michael Paquier wrote: > On Tue, Feb 05, 2019 at 05:46:55PM +1300, David Rowley wrote: >> I did leave a couple untouched as there was quite a bit of escaping >> going on already. I didn't think switching between \Q and \E would >> have made those ones any more pleasing to the eye. Indeed. I have stuck with your version here. > - qr/^ALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m, > + qr/^\QALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m, > I would have just appended the \E for consistency at the end of the > strings. Except that it looks fine. No need to send an updated > version, it seems that you have all the spots. I'll do an extra pass > on it tomorrow and see if I can commit it. And done after checking the whole set. -- Michael
Вложения
> On 6 Feb 2019, at 09:39, Michael Paquier <michael@paquier.xyz> wrote: > And done after checking the whole set. I still think we should enforce one-or-more matching on the OWNER part as well, since matching zero would be a syntax error. There are more .* matches but I’ve only touched the ones which match SQL, since there is a defined grammar to rely on there. The attached patch does that on top of your commit. cheers ./daniel
Вложения
On Wed, 6 Feb 2019 at 21:39, Michael Paquier <michael@paquier.xyz> wrote: > And done after checking the whole set. Thanks for pushing. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Feb 06, 2019 at 10:50:27AM +0100, Daniel Gustafsson wrote: > I still think we should enforce one-or-more matching on the OWNER part as well, > since matching zero would be a syntax error. There are more .* matches but > I’ve only touched the ones which match SQL, since there is a defined grammar to > rely on there. The attached patch does that on top of your commit. - regexp => qr/^COMMENT ON DATABASE postgres IS .*;/m, + regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m, So... With what's on HEAD the current regex means that all these are correct commands: COMMENT ON DATABASE postgres is ; COMMENT ON DATABASE postgres is foo; COMMENT ON DATABASE postgres is foo; And for the first one that's obviously wrong. So what you are suggesting is to actually make the first pattern something to complain about, right? And ".+" this makes sure that at least one character is present, while for ".*" it is fine to have zero characters. What you are suggesting looks right if I understood that right. -- Michael
Вложения
> On 6 Feb 2019, at 12:37, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Feb 06, 2019 at 10:50:27AM +0100, Daniel Gustafsson wrote: >> I still think we should enforce one-or-more matching on the OWNER part as well, >> since matching zero would be a syntax error. There are more .* matches but >> I’ve only touched the ones which match SQL, since there is a defined grammar to >> rely on there. The attached patch does that on top of your commit. > > - regexp => qr/^COMMENT ON DATABASE postgres IS .*;/m, > + regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m, > So... With what's on HEAD the current regex means that all these are > correct commands: > COMMENT ON DATABASE postgres is ; > COMMENT ON DATABASE postgres is foo; > COMMENT ON DATABASE postgres is foo; > And for the first one that's obviously wrong. > > So what you are suggesting is to actually make the first pattern > something to complain about, right? And ".+" this makes sure that at > least one character is present, while for ".*" it is fine to have zero > characters. What you are suggesting looks right if I understood that > right. Correct. One could argue that the regex is still suboptimal since “COMMENT ON DATABASE postgres IS ;” will be matched as well, but there I think the tradeoff for readability wins. cheers ./daniel
On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote: > Correct. One could argue that the regex is still suboptimal since “COMMENT ON > DATABASE postgres IS ;” will be matched as well, but there I think the tradeoff > for readability wins. Okay, that looks like an improvement anyway, so committed after going over the tests for similar problems, and there was one for CREATE DATABASE and DROP ROLE. It is possible to have a regex which tells as at least one non-whitespace character, but from what I can see these don't really improve the readability. -- Michael
Вложения
Moin, On Wed, February 6, 2019 8:10 pm, Michael Paquier wrote: > On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote: >> Correct. One could argue that the regex is still suboptimal since >> “COMMENT ON >> DATABASE postgres IS ;” will be matched as well, but there I think the >> tradeoff >> for readability wins. > > Okay, that looks like an improvement anyway, so committed after going > over the tests for similar problems, and there was one for CREATE > DATABASE and DROP ROLE. It is possible to have a regex which tells as > at least one non-whitespace character, but from what I can see these > don't really improve the readability. Sorry for being late to the party, but it just occured to me that while ".+" is definitely an improvement over ".*", it isn't foolproof either, as it also matches ";". Thus: regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m, matches things like: 'COMMENT ON DATABASE postgres IS ;;' 'COMMENT ON DATABASE postgres IS ;' 'COMMENT ON DATABASE postgres IS --;' etc. I'm not sure it is really necessary to deal with these cases, but one possibility would be to pre-compute regexps: $QR_COMMENT = qr/[^ ;]+/; $QR_IDENTIFIER = qr/[^ ;]+/; # etc or whataver is the thing that should actually be matched here and use them like so: regexp => qr/^COMMENT ON DATABASE postgres IS $QR_COMMENT;/m, That way it is easily changable and quite readable. Oh, one more question. Shouldn't these regexps that start with "^" also end with "$"? Or can there be output like: 'COMMENT ON DATABASE postgres IS $QR_IDENTIFIER; SELECT 1;' ? Best regards, Tels
> On 7 Feb 2019, at 13:55, Tels <nospam-pg-abuse@bloodgate.com> wrote: > On Wed, February 6, 2019 8:10 pm, Michael Paquier wrote: >> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote: >>> Correct. One could argue that the regex is still suboptimal since >>> “COMMENT ON >>> DATABASE postgres IS ;” will be matched as well, but there I think the >>> tradeoff >>> for readability wins. >> >> Okay, that looks like an improvement anyway, so committed after going >> over the tests for similar problems, and there was one for CREATE >> DATABASE and DROP ROLE. It is possible to have a regex which tells as >> at least one non-whitespace character, but from what I can see these >> don't really improve the readability. > > Sorry for being late to the party, but it just occured to me that while > ".+" is definitely an improvement over ".*", it isn't foolproof either, > as it also matches ";”. Correct. The idea was to maintain readability while making the regex a bit better, without any claims to make it perfect. One can argue whether or not that’s enough, but IMO keeping the tests as readable as possible is preferrable when it comes to the tests in question, as they aren’t matching against user supplied SQL but machine generated. cheers ./daniel
On Thu, Feb 07, 2019 at 10:10:32AM +0900, Michael Paquier wrote: > On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote: > > Correct. One could argue that the regex is still suboptimal since “COMMENT ON > > DATABASE postgres IS ;” will be matched as well, but there I think the tradeoff > > for readability wins. > > Okay, that looks like an improvement anyway, so committed after going > over the tests for similar problems, and there was one for CREATE > DATABASE and DROP ROLE. It is possible to have a regex which tells as > at least one non-whitespace character, but from what I can see these > don't really improve the readability. Are you talking about \w+, or [^[:space:]]+, [^[:blank:]]+, or...? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
> On 7 Feb 2019, at 18:20, David Fetter <david@fetter.org> wrote: > > On Thu, Feb 07, 2019 at 10:10:32AM +0900, Michael Paquier wrote: >> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote: >>> Correct. One could argue that the regex is still suboptimal since “COMMENT ON >>> DATABASE postgres IS ;” will be matched as well, but there I think the tradeoff >>> for readability wins. >> >> Okay, that looks like an improvement anyway, so committed after going >> over the tests for similar problems, and there was one for CREATE >> DATABASE and DROP ROLE. It is possible to have a regex which tells as >> at least one non-whitespace character, but from what I can see these >> don't really improve the readability. > > Are you talking about \w+, or [^[:space:]]+, [^[:blank:]]+, or…? Personally I feel that expanding these test regexes to catch more low-risk bugs, at the cost of readability, is a slippery slope towards implementing a SQL parser in regexes. That was my $0.02 for not attempting to make these bulletproof. cheers ./daniel
On Thu, Feb 07, 2019 at 03:33:54PM +0100, Daniel Gustafsson wrote: > Correct. The idea was to maintain readability while making the regex a bit > better, without any claims to make it perfect. Agreed with your position. I see no point is making all the expressions unreadable for little gain. What Daniel proposed upthread had a better balance in my opinion than the previous behavior, without sacrifying the readability of anything and still improving the error detection. -- Michael
Вложения
On Fri, 8 Feb 2019 at 13:04, Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Feb 07, 2019 at 03:33:54PM +0100, Daniel Gustafsson wrote: > > Correct. The idea was to maintain readability while making the regex a bit > > better, without any claims to make it perfect. > > Agreed with your position. I see no point is making all the > expressions unreadable for little gain. What Daniel proposed upthread > had a better balance in my opinion than the previous behavior, without > sacrifying the readability of anything and still improving the error > detection. FWIW, this was the first time I'd really worked with TAP tests. I had been looking into it because I needed to add a new test. I was surprised to see it working this way and not just doing a diff with some known good output. I'm maybe just missing the reason that's not possible, but to me, it seems a bit less error-prone than trying to ensure an exact match with a bunch of regular expressions. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services