Обсуждение: BUG #16804: substring() function returns "negative substring length" error when using a large length argument

Поиск
Список
Период
Сортировка

BUG #16804: substring() function returns "negative substring length" error when using a large length argument

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16804
Logged by:          Rafi Shamim
Email address:      rafiss@gmail.com
PostgreSQL version: 13.1
Operating system:   MacOS 10.15.7
Description:

Reproduction steps:

1. Connect to any PostgreSQL database.
2. Run this query:
> select substring('string' from 2 for 2147483646);

Actual result:

2021-01-04 12:43:13.145 EST [85734] ERROR:  negative substring length not
allowed
2021-01-04 12:43:13.145 EST [85734] STATEMENT:  select substring('string'
from 2 for 2147483646)
negative substring length not allowed


Expected result:

I don't know if this is covered by these docs:
https://www.postgresql.org/docs/13/functions-string.html
But I would expect one of the following:
1. There should be no error message, and the result should be 'tring'.
(Meaning it just goes to the end of the string.)
2. An error message that says that the length argument is too long.

I don't know what the SQL standard says, but when I try similar queries with
SQLite and MSSQL, the result is 'tring'.




po 4. 1. 2021 v 19:44 odesílatel PG Bug reporting form <noreply@postgresql.org> napsal:
The following bug has been logged on the website:

Bug reference:      16804
Logged by:          Rafi Shamim
Email address:      rafiss@gmail.com
PostgreSQL version: 13.1
Operating system:   MacOS 10.15.7
Description:       

Reproduction steps:

1. Connect to any PostgreSQL database.
2. Run this query:
> select substring('string' from 2 for 2147483646);

Actual result:

2021-01-04 12:43:13.145 EST [85734] ERROR:  negative substring length not
allowed
2021-01-04 12:43:13.145 EST [85734] STATEMENT:s
negative substring length not allowed


Expected result:

I don't know if this is covered by these docs:
https://www.postgresql.org/docs/13/functions-string.html
But I would expect one of the following:
1. There should be no error message, and the result should be 'tring'.
(Meaning it just goes to the end of the string.)
2. An error message that says that the length argument is too long.

I don't know what the SQL standard says, but when I try similar queries with
SQLite and MSSQL, the result is 'tring'.

Minimally this is a bug and it should raise an error "integer out of range". Probably in this case we can use MAX_INT as a special value of unlimited length, although it is a little bit scary, because length is an optional value. The attached patch should fix this issue. I do not have access to Oracle to check the behaviour of this case there.

Regards

Pavel

Вложения
Hi,

On Jan 4, 2021, at 11:21 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:



po 4. 1. 2021 v 19:44 odesílatel PG Bug reporting form <noreply@postgresql.org> napsal:
> select substring('string' from 2 for 2147483646);

Actual result:

2021-01-04 12:43:13.145 EST [85734] ERROR:  negative substring length not
allowed
2021-01-04 12:43:13.145 EST [85734] STATEMENT:s
negative substring length not allowed

Minimally this is a bug and it should raise an error "integer out of range". Probably in this case we can use MAX_INT as a special value of unlimited length, although it is a little bit scary, because length is an optional value. The attached patch should fix this issue. I do not have access to Oracle to check the behaviour of this case there.

Except according to the pg docs, it’s not out of range, it’s one less than MAX_INT.

The manual calls for it to be an integer, which is defined as:

integer4 bytestypical choice for integer-2147483648 to +2147483647

The original bug report is one less than +2147483647, and thus should be a valid value, no?


Вложения


po 4. 1. 2021 v 20:25 odesílatel Jerry Sievert <jerry@legitimatesounding.com> napsal:
Hi,

On Jan 4, 2021, at 11:21 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:



po 4. 1. 2021 v 19:44 odesílatel PG Bug reporting form <noreply@postgresql.org> napsal:
> select substring('string' from 2 for 2147483646);

Actual result:

2021-01-04 12:43:13.145 EST [85734] ERROR:  negative substring length not
allowed
2021-01-04 12:43:13.145 EST [85734] STATEMENT:s
negative substring length not allowed

Minimally this is a bug and it should raise an error "integer out of range". Probably in this case we can use MAX_INT as a special value of unlimited length, although it is a little bit scary, because length is an optional value. The attached patch should fix this issue. I do not have access to Oracle to check the behaviour of this case there.

Except according to the pg docs, it’s not out of range, it’s one less than MAX_INT.

The manual calls for it to be an integer, which is defined as:

integer4 bytestypical choice for integer-2147483648 to +2147483647

The original bug report is one less than +2147483647, and thus should be a valid value, no?

yes, so the implementation patch is really correct.

Regards

Pavel


Pavel Stehule <pavel.stehule@gmail.com> writes:
> po 4. 1. 2021 v 20:25 odesilatel Jerry Sievert <jerry@legitimatesounding.com>
> napsal:
>> The original bug report is one less than +2147483647, and thus should be a
>> valid value, no?

> yes, so the implementation patch is really correct.

I agree that this is a bug, and that what we should do in case of integer
overflow is return all the rest of the string.  But this patch doesn't
really get the job done, because you haven't accounted for *negative*
overflow, ie, S+L < INT_MIN.  I think the best way to fix that is to
explicitly check for L < 0 rather than trying to wait till after the
addition.

Looking around, I notice that there's an unprotected multiplication
further down.  Also, having seen this, I feel very uncomfortable about
the fact that detoast_attr_slice and cohorts aren't guarding against
the same sort of integer overflow.  I don't think this is the only
caller that might pass out-of-range values.

In short, I think we need something more like the attached.

            regards, tom lane

diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
index b118f4a714..d1cdbaf648 100644
--- a/src/backend/access/common/detoast.c
+++ b/src/backend/access/common/detoast.c
@@ -17,6 +17,7 @@
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/toast_internals.h"
+#include "common/int.h"
 #include "common/pg_lzcompress.h"
 #include "utils/expandeddatum.h"
 #include "utils/rel.h"
@@ -196,7 +197,8 @@ detoast_attr(struct varlena *attr)
  *        Public entry point to get back part of a toasted value
  *        from compression or external storage.
  *
- * Note: When slicelength is negative, return suffix of the value.
+ * sliceoffset is where to start (zero or more)
+ * If slicelength < 0, return everything beyond sliceoffset
  * ----------
  */
 struct varlena *
@@ -206,8 +208,21 @@ detoast_attr_slice(struct varlena *attr,
     struct varlena *preslice;
     struct varlena *result;
     char       *attrdata;
+    int32        slicelimit;
     int32        attrsize;

+    if (sliceoffset < 0)
+        elog(ERROR, "invalid sliceoffset: %d", sliceoffset);
+
+    /*
+     * Compute slicelimit = offset + length, or -1 if we must fetch all of the
+     * value.  In case of integer overflow, we must fetch all.
+     */
+    if (slicelength < 0)
+        slicelimit = -1;
+    else if (pg_add_s32_overflow(sliceoffset, slicelength, &slicelimit))
+        slicelength = slicelimit = -1;
+
     if (VARATT_IS_EXTERNAL_ONDISK(attr))
     {
         struct varatt_external toast_pointer;
@@ -223,7 +238,7 @@ detoast_attr_slice(struct varlena *attr,
          * at least the requested part (when a prefix is requested).
          * Otherwise, just fetch all slices.
          */
-        if (slicelength > 0 && sliceoffset >= 0)
+        if (slicelimit >= 0)
         {
             int32        max_size;

@@ -231,7 +246,7 @@ detoast_attr_slice(struct varlena *attr,
              * Determine maximum amount of compressed data needed for a prefix
              * of a given length (after decompression).
              */
-            max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
+            max_size = pglz_maximum_compressed_size(slicelimit,
                                                     toast_pointer.va_extsize);

             /*
@@ -270,8 +285,8 @@ detoast_attr_slice(struct varlena *attr,
         struct varlena *tmp = preslice;

         /* Decompress enough to encompass the slice and the offset */
-        if (slicelength > 0 && sliceoffset >= 0)
-            preslice = toast_decompress_datum_slice(tmp, slicelength + sliceoffset);
+        if (slicelimit >= 0)
+            preslice = toast_decompress_datum_slice(tmp, slicelimit);
         else
             preslice = toast_decompress_datum(tmp);

@@ -297,8 +312,7 @@ detoast_attr_slice(struct varlena *attr,
         sliceoffset = 0;
         slicelength = 0;
     }
-
-    if (((sliceoffset + slicelength) > attrsize) || slicelength < 0)
+    else if (slicelength < 0 || slicelimit > attrsize)
         slicelength = attrsize - sliceoffset;

     result = (struct varlena *) palloc(slicelength + VARHDRSZ);
@@ -410,6 +424,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset,
     if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) && slicelength > 0)
         slicelength = slicelength + sizeof(int32);

+    /*
+     * Adjust length request if needed.  (Note: our sole caller,
+     * detoast_attr_slice, protects us against sliceoffset + slicelength
+     * overflowing.)
+     */
     if (((sliceoffset + slicelength) > attrsize) || slicelength < 0)
         slicelength = attrsize - sliceoffset;

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index ec1fb4d1b9..a36dcc4035 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -868,29 +868,38 @@ text_substring(Datum str, int32 start, int32 length, bool length_not_specified)
     int32        S = start;        /* start position */
     int32        S1;                /* adjusted start position */
     int32        L1;                /* adjusted substring length */
+    int32        E;                /* end position */
+
+    /*
+     * SQL99 says S can be zero or negative, but we still must fetch from the
+     * start of the string.
+     */
+    S1 = Max(S, 1);

     /* life is easy if the encoding max length is 1 */
     if (eml == 1)
     {
-        S1 = Max(S, 1);
-
         if (length_not_specified)    /* special case - get length to end of
                                      * string */
             L1 = -1;
-        else
+        else if (length < 0)
+        {
+            /* SQL99 says to throw an error for E < S, i.e., negative length */
+            ereport(ERROR,
+                    (errcode(ERRCODE_SUBSTRING_ERROR),
+                     errmsg("negative substring length not allowed")));
+            L1 = -1;            /* silence stupider compilers */
+        }
+        else if (pg_add_s32_overflow(S, length, &E))
         {
-            /* end position */
-            int            E = S + length;
-
             /*
-             * A negative value for L is the only way for the end position to
-             * be before the start. SQL99 says to throw an error.
+             * L could be large enough for S + L to overflow, in which case
+             * the substring must run to end of string.
              */
-            if (E < S)
-                ereport(ERROR,
-                        (errcode(ERRCODE_SUBSTRING_ERROR),
-                         errmsg("negative substring length not allowed")));
-
+            L1 = -1;
+        }
+        else
+        {
             /*
              * A zero or negative value for the end position can happen if the
              * start was negative or one. SQL99 says to return a zero-length
@@ -904,8 +913,8 @@ text_substring(Datum str, int32 start, int32 length, bool length_not_specified)

         /*
          * If the start position is past the end of the string, SQL99 says to
-         * return a zero-length string -- PG_GETARG_TEXT_P_SLICE() will do
-         * that for us. Convert to zero-based starting position
+         * return a zero-length string -- DatumGetTextPSlice() will do that
+         * for us.  We need only convert S1 to zero-based starting position.
          */
         return DatumGetTextPSlice(str, S1 - 1, L1);
     }
@@ -926,12 +935,6 @@ text_substring(Datum str, int32 start, int32 length, bool length_not_specified)
         char       *s;
         text       *ret;

-        /*
-         * if S is past the end of the string, the tuple toaster will return a
-         * zero-length string to us
-         */
-        S1 = Max(S, 1);
-
         /*
          * We need to start at position zero because there is no way to know
          * in advance which byte offset corresponds to the supplied start
@@ -942,19 +945,24 @@ text_substring(Datum str, int32 start, int32 length, bool length_not_specified)
         if (length_not_specified)    /* special case - get length to end of
                                      * string */
             slice_size = L1 = -1;
-        else
+        else if (length < 0)
+        {
+            /* SQL99 says to throw an error for E < S, i.e., negative length */
+            ereport(ERROR,
+                    (errcode(ERRCODE_SUBSTRING_ERROR),
+                     errmsg("negative substring length not allowed")));
+            slice_size = L1 = -1;    /* silence stupider compilers */
+        }
+        else if (pg_add_s32_overflow(S, length, &E))
         {
-            int            E = S + length;
-
             /*
-             * A negative value for L is the only way for the end position to
-             * be before the start. SQL99 says to throw an error.
+             * L could be large enough for S + L to overflow, in which case
+             * the substring must run to end of string.
              */
-            if (E < S)
-                ereport(ERROR,
-                        (errcode(ERRCODE_SUBSTRING_ERROR),
-                         errmsg("negative substring length not allowed")));
-
+            slice_size = L1 = -1;
+        }
+        else
+        {
             /*
              * A zero or negative value for the end position can happen if the
              * start was negative or one. SQL99 says to return a zero-length
@@ -972,8 +980,10 @@ text_substring(Datum str, int32 start, int32 length, bool length_not_specified)
             /*
              * Total slice size in bytes can't be any longer than the start
              * position plus substring length times the encoding max length.
+             * If that overflows, we can just use -1.
              */
-            slice_size = (S1 + L1) * eml;
+            if (pg_mul_s32_overflow(E, eml, &slice_size))
+                slice_size = -1;
         }

         /*
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 298b6c48c2..9ac35e226b 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -396,6 +396,21 @@ SELECT SUBSTRING('1234567890' FROM 4 FOR 3) = '456' AS "456";
  t
 (1 row)

+-- test overflow cases
+SELECT SUBSTRING('string' FROM 2 FOR 2147483646) AS "tring";
+ tring
+-------
+ tring
+(1 row)
+
+SELECT SUBSTRING('string' FROM -10 FOR 2147483646) AS "string";
+ string
+--------
+ string
+(1 row)
+
+SELECT SUBSTRING('string' FROM -10 FOR -2147483646) AS "error";
+ERROR:  negative substring length not allowed
 -- T581 regular expression substring (with SQL's bizarre regexp syntax)
 SELECT SUBSTRING('abcdefg' SIMILAR 'a#"(b_d)#"%' ESCAPE '#') AS "bcd";
  bcd
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index ad5221ab6b..cf7bd74de0 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -131,6 +131,11 @@ SELECT SUBSTRING('1234567890' FROM 3) = '34567890' AS "34567890";

 SELECT SUBSTRING('1234567890' FROM 4 FOR 3) = '456' AS "456";

+-- test overflow cases
+SELECT SUBSTRING('string' FROM 2 FOR 2147483646) AS "tring";
+SELECT SUBSTRING('string' FROM -10 FOR 2147483646) AS "string";
+SELECT SUBSTRING('string' FROM -10 FOR -2147483646) AS "error";
+
 -- T581 regular expression substring (with SQL's bizarre regexp syntax)
 SELECT SUBSTRING('abcdefg' SIMILAR 'a#"(b_d)#"%' ESCAPE '#') AS "bcd";
 -- obsolete SQL99 syntax

I forgot to include this in my original bug report, but the bytea substring function (which has a separate implementation in varlena.c) is also affected.

> SELECT encode(substring(decode('010203', 'hex'), 2, 2147483646), 'hex');
2021-01-04 16:50:22.020 EST [85734] ERROR:  negative substring length not allowed
2021-01-04 16:50:22.020 EST [85734] STATEMENT:  SELECT encode(substring(decode('010203', 'hex'), 2, 2147483646), 'hex')
negative substring length not allowed


On Mon, Jan 4, 2021 at 4:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> po 4. 1. 2021 v 20:25 odesilatel Jerry Sievert <jerry@legitimatesounding.com>
> napsal:
>> The original bug report is one less than +2147483647, and thus should be a
>> valid value, no?

> yes, so the implementation patch is really correct.

I agree that this is a bug, and that what we should do in case of integer
overflow is return all the rest of the string.  But this patch doesn't
really get the job done, because you haven't accounted for *negative*
overflow, ie, S+L < INT_MIN.  I think the best way to fix that is to
explicitly check for L < 0 rather than trying to wait till after the
addition.

Looking around, I notice that there's an unprotected multiplication
further down.  Also, having seen this, I feel very uncomfortable about
the fact that detoast_attr_slice and cohorts aren't guarding against
the same sort of integer overflow.  I don't think this is the only
caller that might pass out-of-range values.

In short, I think we need something more like the attached.

                        regards, tom lane

Rafi Shamim <rafiss@gmail.com> writes:
> I forgot to include this in my original bug report, but the bytea substring
> function (which has a separate implementation in varlena.c) is also
> affected.

Um.  And I bet the "bit" variant as well ...

            regards, tom lane