Re: Regex with > 32k different chars causes a backend crash
| От | Heikki Linnakangas |
|---|---|
| Тема | Re: Regex with > 32k different chars causes a backend crash |
| Дата | |
| Msg-id | 515C623B.6070000@vmware.com обсуждение исходный текст |
| Ответ на | Re: Regex with > 32k different chars causes a backend crash (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: Regex with > 32k different chars causes a backend crash
Re: Regex with > 32k different chars causes a backend crash |
| Список | pgsql-hackers |
On 03.04.2013 18:41, Tom Lane wrote: > Heikki Linnakangas<hlinnakangas@vmware.com> writes: >> On 03.04.2013 18:21, Tom Lane wrote: >>> Obviously Henry didn't think that far ahead. I agree that throwing >>> an error is the best solution, and that widening "color" is probably >>> not what we want to do. You want to fix that, or shall I? > >> I can do it. I assume that Tcl has the same bug, so I'll submit a report >> there, too. > > Yes, definitely. > > It occurs to me that at some point it might be useful to convert "color" > to unsigned short, so that you could have 64K of them ...--- but we'd still > need the error check anyway, and there's no reason to tackle such a > change today. I was just thinking the same. In practice, expanding it to 64k doesn't get you much farther. There is this in newdfa(): d->incarea = (struct arcp *) MALLOC(nss * cnfa->ncolors * sizeof(struct arcp)); That's (number of states) * (number of colors) * (constant). The test case I posted earlier would require about 40 GB of RAM for that allocation alone, and fails with an "out of memory" error. Maybe it would be possible to construct a regexp that has a lot of colors but few states, but that's an even more marginal use case. Attached is a patch to add the overflow check. I used the error message "too many distinct characters in regex". That's not totally accurate, because there isn't a limit on distinct characters per se, but on the number of colors. Conceivably, you could have a regexp with more than 32k different characters, but where most of them are mapped to the same color. In practice, it's not helpful to the user to say "too many colors"; he will have no clue what a color is. PS. I was mistaken when I said that this causes an assertion failure; it segfaults even with assertions enabled. - Heikki
Вложения
В списке pgsql-hackers по дате отправления: