Обсуждение: perl: unsafe empty pattern behavior

Поиск
Список
Период
Сортировка

perl: unsafe empty pattern behavior

От
Jeff Davis
Дата:
Moved from discussion on -committers:

https://postgr.es/m/0ef325fa06e7a1605c4e119c4ecb637c67e5fb4e.camel@j-davis.com

Summary:

Do not use perl empty patterns like // or qr// or s//.../, the behavior
is too surprising for perl non-experts. There are a few such uses in
our tests; patch attached. Unfortunately, there is no obvious way to
automatically detect them so I am just relying on grep. I'm sure there
are others here who know more about perl than I do, so
suggestions/corrections are welcome.

Long version:

Some may know this already, but we just discovered the dangers of using
empty patterns in perl:

"If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead... If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match)."

https://perldoc.perl.org/perlop#The-empty-pattern-//

In other words, if you have code like:

   if ('xyz' =~ //)
   {
       print "'xyz' matches //\n";
   }

The match will succeed and print, because there's no previous pattern,
so // is a "genuine" empty pattern, which is treated like /.*/ (I
think?). Then, if you add some other code before it:

   if ('abc' =~ /abc/)
   {
       print "'abc' matches /abc/\n";
   }

   if ('xyz' =~ //)
   {
       print "'xyz' matches //\n";
   }

The first match will succeed, but the second match will fail, because
// is treated like /abc/.

On reflection, that does seem very perl-like. But it can cause
surprising action-at-a-distance if not used carefully, especially for
those who aren't experts in perl. It's much safer to just not use the
empty pattern.

If you use qr// instead:

https://perldoc.perl.org/perlop#qr/STRING/msixpodualn

like:

   if ('abc' =~ qr/abc/)
   {
       print "'abc' matches /abc/\n";
   }

   if ('xyz' =~ qr//)
   {
       print "'xyz' matches //\n";
   }

Then the second match may succeed or may fail, and it's not clear from
the documentation what precise circumstances matter. It seems to fail
on older versions of perl (like 5.16.3) and succeed on newer versions
(5.38.2). However, it may also depend on when the qr// is [re]compiled,
or regex flags, or locale, or may just be undefined.

Regards,
    Jeff Davis



Вложения

Re: perl: unsafe empty pattern behavior

От
Alvaro Herrera
Дата:
On 2024-Mar-12, Jeff Davis wrote:

> Do not use perl empty patterns like // or qr// or s//.../, the behavior
> is too surprising for perl non-experts.

Yeah, nasty.

> There are a few such uses in
> our tests; patch attached. Unfortunately, there is no obvious way to
> automatically detect them so I am just relying on grep. I'm sure there
> are others here who know more about perl than I do, so
> suggestions/corrections are welcome.

I suggest that pg_dump/t/002_pg_dump.pl could use a verification that
the ->{regexp} thing is not empty.  I also tried grepping (for things
like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
have ... but I only looked for the "qr" literal, not other ways to get
regexes.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.              (Don Knuth)



Re: perl: unsafe empty pattern behavior

От
Jeff Davis
Дата:
On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote:
> I suggest that pg_dump/t/002_pg_dump.pl could use a verification that
> the ->{regexp} thing is not empty.

I'm not sure how exactly to test for an empty pattern. The problem is,
we don't really want to test for an empty pattern, because /(?^:)/ is
fine. The problem is //, which gets turned into an actual pattern
(perhaps empty or perhaps not), and by the time it's in the %tests
hash, I think it's too late to distinguish.

Again, I'm not a perl expert, so suggestions welcome.

>   I also tried grepping (for things
> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
> have ... but I only looked for the "qr" literal, not other ways to
> get
> regexes.

I think that's fine. qr// seems the most dangerous, because it seems to
behave differently in different versions of perl.

Grepping for regexes in perl code is an "interesting" exercise.

Regards,
    Jeff Davis




Re: perl: unsafe empty pattern behavior

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote:
>> I also tried grepping (for things
>> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
>> have ... but I only looked for the "qr" literal, not other ways to
>> get regexes.

> I think that's fine. qr// seems the most dangerous, because it seems to
> behave differently in different versions of perl.

I wonder whether perlcritic has sufficiently deep understanding of
Perl code that it could find these hazards.  I already checked,
and found that there's no built-in filter for this (at least not
in the perlcritic version I have), but maybe we could write one?
The rules seem to be plug-in modules, so you can make your own
in principle.

            regards, tom lane



Re: perl: unsafe empty pattern behavior

От
Andrew Dunstan
Дата:
On 2024-03-12 Tu 18:59, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
>> On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote:
>>> I also tried grepping (for things
>>> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
>>> have ... but I only looked for the "qr" literal, not other ways to
>>> get regexes.
>> I think that's fine. qr// seems the most dangerous, because it seems to
>> behave differently in different versions of perl.
> I wonder whether perlcritic has sufficiently deep understanding of
> Perl code that it could find these hazards.  I already checked,
> and found that there's no built-in filter for this (at least not
> in the perlcritic version I have), but maybe we could write one?
> The rules seem to be plug-in modules, so you can make your own
> in principle.



Yeah, that was my thought too. I'd start with ProhibitComplexRegexes.pm 
as a template.

If nobody else does it I'll have a go, but it might take a while.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: perl: unsafe empty pattern behavior

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 2024-03-12 Tu 18:59, Tom Lane wrote:
>> I wonder whether perlcritic has sufficiently deep understanding of
>> Perl code that it could find these hazards.

> Yeah, that was my thought too. I'd start with ProhibitComplexRegexes.pm
> as a template.

Oooh.  Taking a quick look at the source code:

https://metacpan.org/dist/Perl-Critic/source/lib/Perl/Critic/Policy/RegularExpressions/ProhibitComplexRegexes.pm

it seems like it'd be pretty trivial to convert that from "complain if
regex contains more than N characters" to "complain if regex contains
zero characters".

            regards, tom lane