Re: Stack overflow issue

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Stack overflow issue
Дата
Msg-id 3661156.1661871758@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Stack overflow issue  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Stack overflow issue  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
>> I think most likely we should report this to Snowball upstream
>> and see what they think is an appropriate fix.

> Done at [1], and I pushed the other fixes.  Thanks again for the report!

The upstream recommendation, which seems pretty sane to me, is to
simply reject any string exceeding some threshold length as not
possibly being a word.  Apparently it's common to use thresholds
as small as 64 bytes, but in the attached I used 1000 bytes.

            regards, tom lane

diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c
index 68c9213f69..aaf4ff72b6 100644
--- a/src/backend/snowball/dict_snowball.c
+++ b/src/backend/snowball/dict_snowball.c
@@ -272,11 +272,25 @@ dsnowball_lexize(PG_FUNCTION_ARGS)
     DictSnowball *d = (DictSnowball *) PG_GETARG_POINTER(0);
     char       *in = (char *) PG_GETARG_POINTER(1);
     int32        len = PG_GETARG_INT32(2);
-    char       *txt = lowerstr_with_len(in, len);
     TSLexeme   *res = palloc0(sizeof(TSLexeme) * 2);
+    char       *txt;

+    /*
+     * Reject strings exceeding 1000 bytes, as they're surely not words in any
+     * human language.  This restriction avoids wasting cycles on stuff like
+     * base64-encoded data, and it protects us against possible inefficiency
+     * or misbehavior in the stemmers (for example, the Turkish stemmer has an
+     * indefinite recursion so it can crash on long-enough strings).
+     */
+    if (len <= 0 || len > 1000)
+        PG_RETURN_POINTER(res);
+
+    txt = lowerstr_with_len(in, len);
+
+    /* txt is probably not zero-length now, but we'll check anyway */
     if (*txt == '\0' || searchstoplist(&(d->stoplist), txt))
     {
+        /* empty or stopword, so reject */
         pfree(txt);
     }
     else

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Reducing the chunk header sizes on all memory context types
Следующее
От: David Rowley
Дата:
Сообщение: Re: Reducing the chunk header sizes on all memory context types