Обсуждение: prevent invalidly encoded input

Поиск
Список
Период
Сортировка

prevent invalidly encoded input

От
Andrew Dunstan
Дата:
Attached is a patch to the scanner and the COPY code that checks for
invalidly encoded data that can currently leak into our system via \
escapes in quoted literals or text mode copy fields, as recently
discussed. That would still leave holes via chr(), convert() and
possibly other functions, but these two paths are the biggest holes that
need plugging.

cheers

andrew
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.286
diff -c -r1.286 copy.c
*** src/backend/commands/copy.c    7 Sep 2007 20:59:26 -0000    1.286
--- src/backend/commands/copy.c    11 Sep 2007 16:33:38 -0000
***************
*** 2685,2690 ****
--- 2685,2691 ----
          char       *start_ptr;
          char       *end_ptr;
          int            input_len;
+         bool        saw_high_bit = false;

          /* Make sure space remains in fieldvals[] */
          if (fieldno >= maxfields)
***************
*** 2749,2754 ****
--- 2750,2757 ----
                                  }
                              }
                              c = val & 0377;
+                             if (IS_HIGHBIT_SET(c))
+                                 saw_high_bit = true;
                          }
                          break;
                      case 'x':
***************
*** 2772,2777 ****
--- 2775,2782 ----
                                      }
                                  }
                                  c = val & 0xff;
+                                 if (IS_HIGHBIT_SET(c))
+                                     saw_high_bit = true;
                              }
                          }
                          break;
***************
*** 2799,2805 ****
                           * literally
                           */
                  }
!             }

              /* Add c to output string */
              *output_ptr++ = c;
--- 2804,2810 ----
                           * literally
                           */
                  }
!             }

              /* Add c to output string */
              *output_ptr++ = c;
***************
*** 2808,2813 ****
--- 2813,2828 ----
          /* Terminate attribute value in output area */
          *output_ptr++ = '\0';

+         /* If we de-escaped a char with the high bit set, make sure
+          * we still have valid data for the db encoding. Avoid calling strlen
+          * here for the sake of efficiency.
+          */
+         if (saw_high_bit)
+         {
+             char *fld = fieldvals[fieldno];
+             pg_verifymbstr(fld, output_ptr - (fld + 1), false);
+         }
+
          /* Check whether raw input matched null marker */
          input_len = end_ptr - start_ptr;
          if (input_len == cstate->null_print_len &&
Index: src/backend/parser/scan.l
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v
retrieving revision 1.140
diff -c -r1.140 scan.l
*** src/backend/parser/scan.l    12 Aug 2007 20:18:06 -0000    1.140
--- src/backend/parser/scan.l    11 Sep 2007 16:33:38 -0000
***************
*** 443,448 ****
--- 443,449 ----
  <xq,xe>{quotefail} {
                      yyless(1);
                      BEGIN(INITIAL);
+                     pg_verifymbstr(literalbuf, literallen, false);
                      yylval.str = litbufdup();
                      return SCONST;
                  }
***************
*** 508,513 ****
--- 509,515 ----
                      {
                          pfree(dolqstart);
                          BEGIN(INITIAL);
+                         pg_verifymbstr(literalbuf, literallen, false);
                          yylval.str = litbufdup();
                          return SCONST;
                      }
***************
*** 545,550 ****
--- 547,553 ----
                      BEGIN(INITIAL);
                      if (literallen == 0)
                          yyerror("zero-length delimited identifier");
+                     pg_verifymbstr(literalbuf, literallen, false);
                      ident = litbufdup();
                      if (literallen >= NAMEDATALEN)
                          truncate_identifier(ident, literallen, true);

Re: prevent invalidly encoded input

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> Attached is a patch to the scanner and the COPY code that checks for
> invalidly encoded data that can currently leak into our system via \
> escapes in quoted literals or text mode copy fields, as recently
> discussed. That would still leave holes via chr(), convert() and
> possibly other functions, but these two paths are the biggest holes that
> need plugging.

The COPY code looks sane.  On the scan.l change, I believe two out of
three of those calls are useless, because we do not do backslash
processing in dollar-quoted strings nor in quoted identifiers.
Also, I'd kinda like to have the check-for-high-bit optimization in
scan.l too --- some people do throw big literals at the thing.

            regards, tom lane

Re: prevent invalidly encoded input

От
Andrew Dunstan
Дата:

Tom Lane wrote:
> Also, I'd kinda like to have the check-for-high-bit optimization in
> scan.l too --- some people do throw big literals at the thing.
>
>
>
OK, will do. Am I correct in thinking I don't need to worry about the
<xeescape> case, only the <xeoctesc> and <xehexesc> cases?

cheers

andrew

Re: prevent invalidly encoded input

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Also, I'd kinda like to have the check-for-high-bit optimization in
>> scan.l too --- some people do throw big literals at the thing.
>>
> OK, will do. Am I correct in thinking I don't need to worry about the
> <xeescape> case, only the <xeoctesc> and <xehexesc> cases?

[ squint ... ]  Hm, wouldn't bet on it.  That leads to
unescape_single_char(), which is fine for the cases that it explicitly
knows about (\b and so on), but what if the following byte has the
high bit set?  Not only would that pass through a high bit to the
output, but very possibly this results in disassembling a multibyte
input character.

So it looks like you need to recheck if unescape_single_char sees a
high-bit-set char.

You should take a second look at the COPY code to see if there's a
similar case there --- I forget what it does with backslash followed
by non-digit.

            regards, tom lane

Re: prevent invalidly encoded input

От
Andrew Dunstan
Дата:

Tom Lane wrote:
>
> So it looks like you need to recheck if unescape_single_char sees a
> high-bit-set char.
>
> You should take a second look at the COPY code to see if there's a
> similar case there --- I forget what it does with backslash followed
> by non-digit.
>
>

It's covered. Revised patch attached. I'll probably apply this some time
tomorrow.

cheers

andrew
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.286
diff -c -r1.286 copy.c
*** src/backend/commands/copy.c    7 Sep 2007 20:59:26 -0000    1.286
--- src/backend/commands/copy.c    12 Sep 2007 03:21:25 -0000
***************
*** 2685,2690 ****
--- 2685,2691 ----
          char       *start_ptr;
          char       *end_ptr;
          int            input_len;
+         bool        saw_high_bit = false;

          /* Make sure space remains in fieldvals[] */
          if (fieldno >= maxfields)
***************
*** 2749,2754 ****
--- 2750,2757 ----
                                  }
                              }
                              c = val & 0377;
+                             if (IS_HIGHBIT_SET(c))
+                                 saw_high_bit = true;
                          }
                          break;
                      case 'x':
***************
*** 2772,2777 ****
--- 2775,2782 ----
                                      }
                                  }
                                  c = val & 0xff;
+                                 if (IS_HIGHBIT_SET(c))
+                                     saw_high_bit = true;
                              }
                          }
                          break;
***************
*** 2799,2805 ****
                           * literally
                           */
                  }
!             }

              /* Add c to output string */
              *output_ptr++ = c;
--- 2804,2810 ----
                           * literally
                           */
                  }
!             }

              /* Add c to output string */
              *output_ptr++ = c;
***************
*** 2808,2813 ****
--- 2813,2828 ----
          /* Terminate attribute value in output area */
          *output_ptr++ = '\0';

+         /* If we de-escaped a char with the high bit set, make sure
+          * we still have valid data for the db encoding. Avoid calling strlen
+          * here for the sake of efficiency.
+          */
+         if (saw_high_bit)
+         {
+             char *fld = fieldvals[fieldno];
+             pg_verifymbstr(fld, output_ptr - (fld + 1), false);
+         }
+
          /* Check whether raw input matched null marker */
          input_len = end_ptr - start_ptr;
          if (input_len == cstate->null_print_len &&
Index: src/backend/parser/scan.l
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v
retrieving revision 1.140
diff -c -r1.140 scan.l
*** src/backend/parser/scan.l    12 Aug 2007 20:18:06 -0000    1.140
--- src/backend/parser/scan.l    12 Sep 2007 03:21:26 -0000
***************
*** 60,65 ****
--- 60,66 ----
  bool            standard_conforming_strings = false;

  static bool        warn_on_first_escape;
+ static bool     saw_high_bit = false;

  /*
   * literalbuf is used to accumulate literal values when multiple rules
***************
*** 426,431 ****
--- 427,433 ----

  {xqstart}        {
                      warn_on_first_escape = true;
+                     saw_high_bit = false;
                      SET_YYLLOC();
                      if (standard_conforming_strings)
                          BEGIN(xq);
***************
*** 435,440 ****
--- 437,443 ----
                  }
  {xestart}        {
                      warn_on_first_escape = false;
+                     saw_high_bit = false;
                      SET_YYLLOC();
                      BEGIN(xe);
                      startlit();
***************
*** 443,448 ****
--- 446,453 ----
  <xq,xe>{quotefail} {
                      yyless(1);
                      BEGIN(INITIAL);
+                     if (saw_high_bit)
+                         pg_verifymbstr(literalbuf, literallen, false);
                      yylval.str = litbufdup();
                      return SCONST;
                  }
***************
*** 469,486 ****
--- 474,497 ----
                      }
                      check_string_escape_warning(yytext[1]);
                      addlitchar(unescape_single_char(yytext[1]));
+                     if (IS_HIGHBIT_SET(literalbuf[literallen]))
+                         saw_high_bit = true;
                  }
  <xe>{xeoctesc}  {
                      unsigned char c = strtoul(yytext+1, NULL, 8);

                      check_escape_warning();
                      addlitchar(c);
+                     if (IS_HIGHBIT_SET(c))
+                         saw_high_bit = true;
                  }
  <xe>{xehexesc}  {
                      unsigned char c = strtoul(yytext+2, NULL, 16);

                      check_escape_warning();
                      addlitchar(c);
+                     if (IS_HIGHBIT_SET(c))
+                         saw_high_bit = true;
                  }
  <xq,xe>{quotecontinue} {
                      /* ignore */

Re: prevent invalidly encoded input

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
>                       addlitchar(unescape_single_char(yytext[1]));
> +                     if (IS_HIGHBIT_SET(literalbuf[literallen]))
> +                         saw_high_bit = true;

Isn't that array subscript off-by-one?  Probably better to put the test
inside unescape_single_char(), anyway.

Otherwise it looks sane, though maybe shy a comment or so.

            regards, tom lane