Re: BUG #18080: to_tsvector fails for long text input
От | Alvaro Herrera |
---|---|
Тема | Re: BUG #18080: to_tsvector fails for long text input |
Дата | |
Msg-id | 202309151141.pq2zpi5kxdvn@alvherre.pgsql обсуждение исходный текст |
Ответ на | BUG #18080: to_tsvector fails for long text input (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #18080: to_tsvector fails for long text input
|
Список | pgsql-bugs |
On 2023-Sep-04, PG Bug reporting form wrote: > SELECT to_tsvector('english'::regconfig, (REPEAT('<Long123456789/>'::text, > 20000000))); > results in > ERROR: invalid memory alloc request size 2133333320 This is because to_tsvector_byid does this: prs.lenwords = VARSIZE_ANY_EXHDR(in) / 6; /* just estimation of word's * number */ if (prs.lenwords < 2) prs.lenwords = 2; prs.curwords = 0; prs.pos = 0; prs.words = (ParsedWord *) palloc(sizeof(ParsedWord) * prs.lenwords); where sizeof(ParsedWord) is 40 (in my laptop). So this tries to allocate more memory than palloc() is willing to give it. The attached patch fixes just the query you supplied and nothing else. I wonder if we want to support this kind of thing; I suspect we don't. Other parts of text-search would fail in the same way and would also need to receive similar fixes. However, the real problem comes when we try to store such huge tsvectors, because that means we end up with "huge" tuples on disk that need I/O support. Eventually AFAIR you run into the size limit in the FE/BE protocol and all crashes and burns because that one cannot be changed without bumping the version. So I don't think this patch actually does you any good. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Вложения
В списке pgsql-bugs по дате отправления: