Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB
От | James Inform |
---|---|
Тема | Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB |
Дата | |
Msg-id | 89216301-1ce5-6696-636c-456d6a53bd1a@pharmapp.de обсуждение исходный текст |
Ответ на | Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB (Julien Rouhaud <rjuju123@gmail.com>) |
Список | pgsql-bugs |
Well I understand. But in my scenario I am using ansi characters which are just 1-byte-utf8. The system should be capable of handling that. Maybe we could implement a different approach for text column content > 256 GB, where we implement something like a "streaming" read where memory is not allocated based on fixed 4-bytes per character, but on the true length of characters found in the text column or returned by a function like lower(). But as a user i would expect that I can store as many characters as possible into a text field without getting errors in any way. The following example will give an error although the amount of text easily fits into the column because the 'x' is ansi and takes 1 byte. create temp table mytest(mydata text); create index on mytest(lower(mydata)); insert into mytest select repeat('x',300*1024*1024) as mydata; Without the index I could easily store 800mb in the text column: select repeat('x',800*1024*1024) as mydata; So from what I see we should at least extend the documentation and tell people that although a text field can hold up to 1GB of string data, only 256MB can safely be used if one wants to use function-based indexed or function on the column inside a where clauses. So for everyone using text columns in a more than basic way, we should simply tell everyone not not store more than 256 * 1024*1024 characters in a text column. Julien Rouhaud wrote: > On Mon, Aug 16, 2021 at 4:46 PM PG Bug reporting form > <noreply@postgresql.org> wrote: >> When I try to search a text field in a where clause that holds more than 250 >> MB of text, PostgreSQL runs out of memory for the request. >> >> You can reproduce this behaviour with the following sql statement: >> >> with >> q1 as >> ( >> -- 260*1024*1024 = 260MB >> select repeat('x',260*1024*1024) as mydata >> ) >> select count(*) from q1 where lower(mydata) like '%a%' >> ; >> >> ERROR: invalid memory alloc request size 1090519044 >> >> Using just a 250MB string: >> >> with >> q1 as >> ( >> -- 250*1024*1024 = 250MB >> select repeat('x',250*1024*1024) as mydata >> ) >> select count(*) from q1 where lower(mydata) like '%a%' >> ; >> >> Everything is fine! >> >> The alloc request size seems to be 4 times the length of the text field. >> [...] >> Is this an expected behaviour? >> Why ist PostgreSQL allocating 4 times the column's memory when doing a >> search like above? > This is unfortunately the expected behavior, assuming that you're not > dealing with C/POSIX encoding. > > This is because in multibyte encoding each character can occupy up to > 4B. Postgres needs to allocate a single chunk of memory to hold the > resulting text, and it has no way to know how many multibyte > characters are present in the input string or how many character will > have a different size when down-cased, so it has to allocate the > maximum size that may be needed, which is 4 times the size of the > input string. And there's a strict 1GB limitation for a single field > size, thus the ~256MB limit.
В списке pgsql-bugs по дате отправления: