Обсуждение: Memory bug in dsnowball_lexize
Hackers, In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses malloc (not palloc) to allocate memory, and on memory exhaustion returns NULL rather than throwing an exception. In this same file, 'replace_s' calls 'create_s' and if it gets back NULL, returns the error code -1. Otherwise, it sets z->p to the allocated memory. In src/backend/snowball/libstemmer/api.c, 'SN_set_current' calls 'replace_s' and returns whatever 'replace_s' returned, which in the case of memory exhaustion will be -1. In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize' calls 'SN_set_current' and ignores the return value, thereby failing to notice the error, if any. I checked one of the stemmers, stem_ISO_8859_1_english.c, and it treats z->p as an array without checking whether it is NULL. This will crash the backend in the above error case. There is something else weird here, though. The call to 'SN_set_current' is wrapped in a memory context switch, along with a call to the stemmer, as if the caller expects any allocated memory to be palloc'd, which it is not, given the underlying code's use of malloc and calloc. There is a comment higher up in dict_snowball.c that seems to use some handwaving about all this, or perhaps it is documenting something else entirely. In any event, I find the documentation about dictCtx insufficient to explain why this memory handling is correct. mark
Mark Dilger <hornschnorter@gmail.com> writes: > In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses > malloc (not palloc) to allocate memory, and on memory exhaustion > returns NULL rather than throwing an exception. Actually not, see macros in src/include/snowball/header.h. > In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize' > calls 'SN_set_current' and ignores the return value, thereby > failing to notice the error, if any. Hm. This seems like possibly a bug, in that even if we cover the malloc issue, there's no API guarantee that OOM is the only possible reason for reporting failure. > There is a comment higher up in dict_snowball.c that seems to > use some handwaving about all this, or perhaps it is documenting > something else entirely. In any event, I find the documentation > about dictCtx insufficient to explain why this memory handling > is correct. Fair complaint --- do you want to propose some new wording that references what header.h does? regards, tom lane
On Thu, May 23, 2019 at 8:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Mark Dilger <hornschnorter@gmail.com> writes: > > In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses > > malloc (not palloc) to allocate memory, and on memory exhaustion > > returns NULL rather than throwing an exception. > > Actually not, see macros in src/include/snowball/header.h. You are correct. Thanks for the pointer. > > In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize' > > calls 'SN_set_current' and ignores the return value, thereby > > failing to notice the error, if any. > > Hm. This seems like possibly a bug, in that even if we cover the > malloc issue, there's no API guarantee that OOM is the only possible > reason for reporting failure. Ok, that sounds fair. Since the memory is being palloc'd, I suppose it would be safe to just ereport when the return value is -1? > > There is a comment higher up in dict_snowball.c that seems to > > use some handwaving about all this, or perhaps it is documenting > > something else entirely. In any event, I find the documentation > > about dictCtx insufficient to explain why this memory handling > > is correct. > > Fair complaint --- do you want to propose some new wording that > references what header.h does? Perhaps something along these lines? /* - * snowball saves alloced memory between calls, so we should run it in our - * private memory context. Note, init function is executed in long lived - * context, so we just remember CurrentMemoryContext + * snowball saves alloced memory between calls, which we force to be + * allocated using palloc and friends via preprocessing macros (see + * snowball/header.h), so we should run snowball in our private memory + * context. Note, init function is executed in long lived context, so we + * just remember CurrentMemoryContext. */
Mark Dilger <hornschnorter@gmail.com> writes: > On Thu, May 23, 2019 at 8:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Mark Dilger <hornschnorter@gmail.com> writes: >>> In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize' >>> calls 'SN_set_current' and ignores the return value, thereby >>> failing to notice the error, if any. >> Hm. This seems like possibly a bug, in that even if we cover the >> malloc issue, there's no API guarantee that OOM is the only possible >> reason for reporting failure. > Ok, that sounds fair. Since the memory is being palloc'd, I suppose > it would be safe to just ereport when the return value is -1? Yeah ... I'd just make it an elog really, since whatever it is would presumably not be a user-facing error. >> Fair complaint --- do you want to propose some new wording that >> references what header.h does? > Perhaps something along these lines? Seems reasonable, please include in patch covering the other thing. regards, tom lane