Обсуждение: Tighten up a few overly lax regexes in pg_dump's tap tests

Поиск
Список
Период
Сортировка

Tighten up a few overly lax regexes in pg_dump's tap tests

От
David Rowley
Дата:
There are a few regexes in pg_dump's tap tests that are neglecting to
escape the dot in "schema.table" type expressions. This could result
in the test passing when it shouldn't.  It seems worth putting that
right.

Patch attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Daniel Gustafsson
Дата:
> On 4 Feb 2019, at 11:54, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> There are a few regexes in pg_dump's tap tests that are neglecting to
> escape the dot in "schema.table" type expressions. This could result
> in the test passing when it shouldn't.  It seems worth putting that
> right.

+1 for tightening it up, and the patch looks good to me.

We may also want to use the + metacharacter instead of * in a few places, since
the intent is to always match something, where matching nothing should be
considered an error:

-          qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m,
+          qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,

cheers ./daniel

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Michael Paquier
Дата:
On Mon, Feb 04, 2019 at 01:12:48PM +0100, Daniel Gustafsson wrote:
> +1 for tightening it up, and the patch looks good to me.
>
> We may also want to use the + metacharacter instead of * in a few places, since
> the intent is to always match something, where matching nothing should be
> considered an error:
>
> -          qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m,
> +          qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,

Some tests are missing the update, and it seems to me that
tightening things up is a good thing, still we ought to do it
consistently.  Some places missing the update:
- ALTER OPERATOR FAMILY
- ALTER OPERATOR CLASS
- ALTER SEQUENCE
- ALTER TABLE (ONLY, partitioned table)
- BLOB load
- COMMENT ON
- COPY
- INSERT INTO
- CREATE COLLATION

test_pg_dump's 001_base.pl needs also a refresh.
--
Michael

Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
David Rowley
Дата:
On Tue, 5 Feb 2019 at 13:15, Michael Paquier <michael@paquier.xyz> wrote:
> Some tests are missing the update, and it seems to me that
> tightening things up is a good thing, still we ought to do it
> consistently.  Some places missing the update:
> - ALTER OPERATOR FAMILY
> - ALTER OPERATOR CLASS
> - ALTER SEQUENCE
> - ALTER TABLE (ONLY, partitioned table)
> - BLOB load
> - COMMENT ON
> - COPY
> - INSERT INTO
> - CREATE COLLATION

I think I've done all the ones that use \E.  Those \Q ones don't need
any escaping. I assume that's the ones you've found.  I didn't do an
exhaustive check though.

> test_pg_dump's 001_base.pl needs also a refresh.

Yeah, I thought about looking, but I thought it might be a bottomless pit.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Michael Paquier
Дата:
On Tue, Feb 05, 2019 at 01:50:50PM +1300, David Rowley wrote:
> I think I've done all the ones that use \E.  Those \Q ones don't need
> any escaping. I assume that's the ones you've found.  I didn't do an
> exhaustive check though.

Oh, good point.  I didn't know that anything between \Q and \E are
interpreted as literal characters.  Instead of the approach you are
proposing, perhaps it would make sense to extend this way of doing
things then?  For example some tests with CREATE CONVERSION do so.  It
looks much more portable than having to escape every dot.

>> test_pg_dump's 001_base.pl needs also a refresh.
>
> Yeah, I thought about looking, but I thought it might be a bottomless pit.

I just double-checked this one, and the regex patterns there look
all correct.
--
Michael

Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
David Rowley
Дата:
On Tue, 5 Feb 2019 at 14:41, Michael Paquier <michael@paquier.xyz> wrote:
>  Instead of the approach you are
> proposing, perhaps it would make sense to extend this way of doing
> things then?  For example some tests with CREATE CONVERSION do so.  It
> looks much more portable than having to escape every dot.

I'm not particularly excited either way, but here's a patch with it that way.

I did leave a couple untouched as there was quite a bit of escaping
going on already. I didn't think switching between \Q and \E would
have made those ones any more pleasing to the eye.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
David Rowley
Дата:
On Tue, 5 Feb 2019 at 01:12, Daniel Gustafsson <daniel@yesql.se> wrote:
> We may also want to use the + metacharacter instead of * in a few places, since
> the intent is to always match something, where matching nothing should be
> considered an error:
>
> -                 qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m,
> +                 qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,

I looked for instances of * alone and didn't see any. I only saw ones
prefixed with ".", in which case, isn't that matching 1 or more chars
already?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
"David G. Johnston"
Дата:
On Monday, February 4, 2019, David Rowley <david.rowley@2ndquadrant.com> wrote:
On Tue, 5 Feb 2019 at 01:12, Daniel Gustafsson <daniel@yesql.se> wrote:
> We may also want to use the + metacharacter instead of * in a few places, since
> the intent is to always match something, where matching nothing should be
> considered an error:
>
> -                 qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m,
> +                 qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,

I looked for instances of * alone and didn't see any. I only saw ones
prefixed with ".", in which case, isn't that matching 1 or more chars
already?

No.  In Regex the following are equivalent:

.* == .{0,}
.+ == .{1,}
. == .{1}

A “*” by itself would either be an error or, assuming the preceding character is a space (so it visually looks alone) would be zero or more consecutive spaces.

In the above “...OWNER TO<space>;” is a valid match.

David J.
 

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Michael Paquier
Дата:
On Tue, Feb 05, 2019 at 05:46:55PM +1300, David Rowley wrote:
> I did leave a couple untouched as there was quite a bit of escaping
> going on already. I didn't think switching between \Q and \E would
> have made those ones any more pleasing to the eye.

-    qr/^ALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m,
+    qr/^\QALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m,
I would have just appended the \E for consistency at the end of the
strings.  Except that it looks fine.  No need to send an updated
version, it seems that you have all the spots.  I'll do an extra pass
on it tomorrow and see if I can commit it.
--
Michael

Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Daniel Gustafsson
Дата:
> On 5 Feb 2019, at 06:55, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> On Monday, February 4, 2019, David Rowley <david.rowley@2ndquadrant.com <mailto:david.rowley@2ndquadrant.com>> wrote:
> On Tue, 5 Feb 2019 at 01:12, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:
> > We may also want to use the + metacharacter instead of * in a few places, since
> > the intent is to always match something, where matching nothing should be
> > considered an error:
> >
> > -                 qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m,
> > +                 qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,
>
> I looked for instances of * alone and didn't see any. I only saw ones
> prefixed with ".", in which case, isn't that matching 1 or more chars
> already?
>
> No.  In Regex the following are equivalent:
>
> .* == .{0,}
> .+ == .{1,}
> . == .{1}
>
> A “*” by itself would either be an error or, assuming the preceding character is a space (so it visually looks alone)
wouldbe zero or more consecutive spaces. 

Sorry for being a bit unclear in my original email, it’s as David says above:
.* matches zero or more characters and .+ matches 1 or more characters.

> In the above “...OWNER TO<space>;” is a valid match.

Indeed, so we should move to matching with .+ to force an owner.

cheers ./daniel

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Michael Paquier
Дата:
On Tue, Feb 05, 2019 at 04:26:18PM +0900, Michael Paquier wrote:
> On Tue, Feb 05, 2019 at 05:46:55PM +1300, David Rowley wrote:
>> I did leave a couple untouched as there was quite a bit of escaping
>> going on already. I didn't think switching between \Q and \E would
>> have made those ones any more pleasing to the eye.

Indeed.  I have stuck with your version here.

> -    qr/^ALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m,
> +    qr/^\QALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m,
> I would have just appended the \E for consistency at the end of the
> strings.  Except that it looks fine.  No need to send an updated
> version, it seems that you have all the spots.  I'll do an extra pass
> on it tomorrow and see if I can commit it.

And done after checking the whole set.
--
Michael

Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Daniel Gustafsson
Дата:
> On 6 Feb 2019, at 09:39, Michael Paquier <michael@paquier.xyz> wrote:

> And done after checking the whole set.

I still think we should enforce one-or-more matching on the OWNER part as well,
since matching zero would be a syntax error.  There are more .* matches but
I’ve only touched the ones which match SQL, since there is a defined grammar to
rely on there.  The attached patch does that on top of your commit.

cheers ./daniel


Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
David Rowley
Дата:
On Wed, 6 Feb 2019 at 21:39, Michael Paquier <michael@paquier.xyz> wrote:
> And done after checking the whole set.

Thanks for pushing.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Michael Paquier
Дата:
On Wed, Feb 06, 2019 at 10:50:27AM +0100, Daniel Gustafsson wrote:
> I still think we should enforce one-or-more matching on the OWNER part as well,
> since matching zero would be a syntax error.  There are more .* matches but
> I’ve only touched the ones which match SQL, since there is a defined grammar to
> rely on there.  The attached patch does that on top of your commit.

-       regexp => qr/^COMMENT ON DATABASE postgres IS .*;/m,
+       regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m,
So...  With what's on HEAD the current regex means that all these are
correct commands:
COMMENT ON DATABASE postgres is ;
COMMENT ON DATABASE postgres is  foo;
COMMENT ON DATABASE postgres is foo;
And for the first one that's obviously wrong.

So what you are suggesting is to actually make the first pattern
something to complain about, right?  And ".+" this makes sure that at
least one character is present, while for ".*" it is fine to have zero
characters.  What you are suggesting looks right if I understood that
right.
--
Michael

Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Daniel Gustafsson
Дата:

> On 6 Feb 2019, at 12:37, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Feb 06, 2019 at 10:50:27AM +0100, Daniel Gustafsson wrote:
>> I still think we should enforce one-or-more matching on the OWNER part as well,
>> since matching zero would be a syntax error.  There are more .* matches but
>> I’ve only touched the ones which match SQL, since there is a defined grammar to
>> rely on there.  The attached patch does that on top of your commit.
>
> -       regexp => qr/^COMMENT ON DATABASE postgres IS .*;/m,
> +       regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m,
> So...  With what's on HEAD the current regex means that all these are
> correct commands:
> COMMENT ON DATABASE postgres is ;
> COMMENT ON DATABASE postgres is  foo;
> COMMENT ON DATABASE postgres is foo;
> And for the first one that's obviously wrong.
>
> So what you are suggesting is to actually make the first pattern
> something to complain about, right?  And ".+" this makes sure that at
> least one character is present, while for ".*" it is fine to have zero
> characters.  What you are suggesting looks right if I understood that
> right.

Correct.  One could argue that the regex is still suboptimal since “COMMENT ON
DATABASE postgres IS  ;” will be matched as well, but there I think the tradeoff
for readability wins.

cheers ./daniel

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Michael Paquier
Дата:
On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
> Correct.  One could argue that the regex is still suboptimal since “COMMENT ON
> DATABASE postgres IS  ;” will be matched as well, but there I think the tradeoff
> for readability wins.

Okay, that looks like an improvement anyway, so committed after going
over the tests for similar problems, and there was one for CREATE
DATABASE and DROP ROLE.  It is possible to have a regex which tells as
at least one non-whitespace character, but from what I can see these
don't really improve the readability.
--
Michael

Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
"Tels"
Дата:
Moin,

On Wed, February 6, 2019 8:10 pm, Michael Paquier wrote:
> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
>> Correct.  One could argue that the regex is still suboptimal since
>> “COMMENT ON
>> DATABASE postgres IS  ;” will be matched as well, but there I think the
>> tradeoff
>> for readability wins.
>
> Okay, that looks like an improvement anyway, so committed after going
> over the tests for similar problems, and there was one for CREATE
> DATABASE and DROP ROLE.  It is possible to have a regex which tells as
> at least one non-whitespace character, but from what I can see these
> don't really improve the readability.

Sorry for being late to the party, but it just occured to me that while
".+"  is definitely an improvement over ".*", it isn't foolproof either,
as it also matches ";".

Thus:

  regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m,

matches things like:

  'COMMENT ON DATABASE postgres IS ;;'
  'COMMENT ON DATABASE postgres IS  ;'
  'COMMENT ON DATABASE postgres IS --;'

etc.

I'm not sure it is really necessary to deal with these cases, but one
possibility would be to pre-compute regexps:

  $QR_COMMENT = qr/[^ ;]+/;
  $QR_IDENTIFIER = qr/[^ ;]+/; # etc

or whataver is the thing that should actually be matched here and use them
like so:

  regexp => qr/^COMMENT ON DATABASE postgres IS $QR_COMMENT;/m,

That way it is easily changable and quite readable.

Oh, one more question. Shouldn't these regexps that start with "^" also
end with "$"? Or can there be output like:

  'COMMENT ON DATABASE postgres IS $QR_IDENTIFIER; SELECT 1;'

?

Best regards,

Tels



Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Daniel Gustafsson
Дата:
> On 7 Feb 2019, at 13:55, Tels <nospam-pg-abuse@bloodgate.com> wrote:
> On Wed, February 6, 2019 8:10 pm, Michael Paquier wrote:
>> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
>>> Correct.  One could argue that the regex is still suboptimal since
>>> “COMMENT ON
>>> DATABASE postgres IS  ;” will be matched as well, but there I think the
>>> tradeoff
>>> for readability wins.
>>
>> Okay, that looks like an improvement anyway, so committed after going
>> over the tests for similar problems, and there was one for CREATE
>> DATABASE and DROP ROLE.  It is possible to have a regex which tells as
>> at least one non-whitespace character, but from what I can see these
>> don't really improve the readability.
>
> Sorry for being late to the party, but it just occured to me that while
> ".+"  is definitely an improvement over ".*", it isn't foolproof either,
> as it also matches ";”.

Correct.  The idea was to maintain readability while making the regex a bit
better, without any claims to make it perfect.  One can argue whether or not
that’s enough, but IMO keeping the tests as readable as possible is preferrable
when it comes to the tests in question, as they aren’t matching against user
supplied SQL but machine generated.

cheers ./daniel

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
David Fetter
Дата:
On Thu, Feb 07, 2019 at 10:10:32AM +0900, Michael Paquier wrote:
> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
> > Correct.  One could argue that the regex is still suboptimal since “COMMENT ON
> > DATABASE postgres IS  ;” will be matched as well, but there I think the tradeoff
> > for readability wins.
> 
> Okay, that looks like an improvement anyway, so committed after going
> over the tests for similar problems, and there was one for CREATE
> DATABASE and DROP ROLE.  It is possible to have a regex which tells as
> at least one non-whitespace character, but from what I can see these
> don't really improve the readability.

Are you talking about \w+, or [^[:space:]]+, [^[:blank:]]+, or...?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Daniel Gustafsson
Дата:
> On 7 Feb 2019, at 18:20, David Fetter <david@fetter.org> wrote:
>
> On Thu, Feb 07, 2019 at 10:10:32AM +0900, Michael Paquier wrote:
>> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
>>> Correct.  One could argue that the regex is still suboptimal since “COMMENT ON
>>> DATABASE postgres IS  ;” will be matched as well, but there I think the tradeoff
>>> for readability wins.
>>
>> Okay, that looks like an improvement anyway, so committed after going
>> over the tests for similar problems, and there was one for CREATE
>> DATABASE and DROP ROLE.  It is possible to have a regex which tells as
>> at least one non-whitespace character, but from what I can see these
>> don't really improve the readability.
>
> Are you talking about \w+, or [^[:space:]]+, [^[:blank:]]+, or…?

Personally I feel that expanding these test regexes to catch more low-risk
bugs, at the cost of readability, is a slippery slope towards implementing a
SQL parser in regexes.  That was my $0.02 for not attempting to make these
bulletproof.

cheers ./daniel

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
Michael Paquier
Дата:
On Thu, Feb 07, 2019 at 03:33:54PM +0100, Daniel Gustafsson wrote:
> Correct.  The idea was to maintain readability while making the regex a bit
> better, without any claims to make it perfect.

Agreed with your position.  I see no point is making all the
expressions unreadable for little gain.  What Daniel proposed upthread
had a better balance in my opinion than the previous behavior, without
sacrifying the readability of anything and still improving the error
detection.
--
Michael

Вложения

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

От
David Rowley
Дата:
On Fri, 8 Feb 2019 at 13:04, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Feb 07, 2019 at 03:33:54PM +0100, Daniel Gustafsson wrote:
> > Correct.  The idea was to maintain readability while making the regex a bit
> > better, without any claims to make it perfect.
>
> Agreed with your position.  I see no point is making all the
> expressions unreadable for little gain.  What Daniel proposed upthread
> had a better balance in my opinion than the previous behavior, without
> sacrifying the readability of anything and still improving the error
> detection.

FWIW, this was the first time I'd really worked with TAP tests.  I had
been looking into it because I needed to add a new test.

I was surprised to see it working this way and not just doing a diff
with some known good output.  I'm maybe just missing the reason that's
not possible, but to me, it seems a bit less error-prone than trying
to ensure an exact match with a bunch of regular expressions.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services