Re: Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes?
От | Francisco Olarte |
---|---|
Тема | Re: Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes? |
Дата | |
Msg-id | CA+bJJbzNHEqufUh=SUGJ_zSXU5TEAgdTgHqpzv_UZ9SVgg6KUg@mail.gmail.com обсуждение исходный текст |
Ответ на | Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes? ("David G. Johnston" <david.g.johnston@gmail.com>) |
Ответы |
Re: Can we make regexp processing more friendly by
recognizing "\r\n" as a "newline" for "^$" purposes?
|
Список | pgsql-general |
Hi David: On Sun, Oct 18, 2015 at 7:49 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: > Other implementation of regular expressions handle "newline" mechanics > related to "^" and "$" semantically instead of literally. By that I mean > that both "\r\n" and "\n" are considered "newlines" instead of just "\n". Which ones ? AFAIK this kind of thing is usually done by C ( and related ) runtimes when reading text files. At least in my machine perl does not do it: censored:~$ perl -e 'print( ("A\r\n" =~ /A$/) ? "matched\n" : "NO MATCH\n");' NO MATCH censored:~$ perl -e 'print( ("A\r\n" =~ /A.$/) ? "matched\n" : "NO MATCH\n");' matched censored:~$ perl -e 'print( ("A\r\n" =~ /A\s$/) ? "matched\n" : "NO MATCH\n");' matched Normally when reading lines in CP/M and related ( MSDOS, Windows ) the CRT does collapse them ( and sometimes just zaps \r, or collapse any run, or consider [\r*]\n[\r*] or.... ). But I normally do not see that behaviour in regexes. > If changing behavior is not desirable I would be content with another flag > that would toggle such behavior. > In code - both of these subqueries should match whereas presently only the > first one does. > SELECT regexp_matches(E'123\n', E'123$', 'w'); > SELECT regexp_matches(E'123\r\n', E'123$', 'w'); > I don't know if this is server O/S dependent...but I would not expect it to > be so. Neither do I ( expect it to be os dep. ) , but I find the current behaviour correct. I mean, newline stuff is OS dependent, and you should convert when ingesting data, when matching them it should already have been converted to whatever the language uses for newlines ( in C and perl that means \n, which needs not be \012, BTW . In unix \n=\012 on disk, on CP/M it's \015\012 and when I worked with Mac ( before the unixy osX they use now ) it was \015, and I cannot think on what they can use on EBCDIC machines ). Francisco Olarte.
В списке pgsql-general по дате отправления: