Обсуждение: BUG #18775: PQgetCopyData always has an out-of-memory error if the table field stores bytea ~700 MB

Поиск
Список
Период
Сортировка
The following bug has been logged on the website:

Bug reference:      18775
Logged by:          Ilya
Email address:      knuazev@gmail.com
PostgreSQL version: 16.0
Operating system:   windows 10
Description:

Hello, I need your help. The essence of the problem is that I am trying to
download a bytea field from a table row. The size of the data in the field
is about 700 MB. In response, I receive an out-of-memory error.

More details:
1) I execute the query
COPY (SELECT %s FROM %s.%s WHERE %s=%s) TO STDOUT BINARY
2) I call the PQgetCopyData function.

This function works with the pg_conn.inBuffer buffer.
I did not go into detail on how memory is allocated for it, but in my case,
about 300 MB is allocated. And the data from the database is successfully
accumulated in this buffer.

And when the buffer is almost full, the function
int pqGetCopyData3(PGconn *conn, char **buffer, int async)
tries to allocate a buffer almost the same size as pg_conn.inBuffer in order
to give it to the user:
*buffer = (char *) malloc(msgLength + 1); (fe-protosol.c  LN. 1767).
In my case, the malloc function consistently fails to do this. It returns
NULL and the library generates an "out of memory" error.

p.s.

It seems to me that the only chance to make PQgetCopyData work correctly
without changing the library interface is to somehow reduce the size of the
pg_conn.inBuffer buffer to small values.

And ideally, it would be good to add the PQgetCopyData2 function that would
accept a buffer and simply fill it with data and return the filling size.

I also tried to get around this problem using the outdated PQgetline
function, but it does not work with binary data. And if you make a COPY
(SELECT %s FROM %s.%s WHERE %s=%s) TO STDOUT query, an error will occur when
trying to execute it. Because, perhaps, the library is trying to download
everything at once into pg_conn.inBuffer, which is too small for this.

And yes, I know that for storing large data you have "large object
facility". But my method should also work for files up to 1 GB.


PG Bug reporting form <noreply@postgresql.org> writes:
> Hello, I need your help. The essence of the problem is that I am trying to
> download a bytea field from a table row. The size of the data in the field
> is about 700 MB. In response, I receive an out-of-memory error.

I don't see this as a Postgres bug.  If you want to work with values
that large, you'd better have plenty of memory available.

            regards, tom lane



Thanks for the answer. I agree that if you want to work with large data, then make sure you have enough memory. If you call the PQexec function with a regular select and then PQgetvalue to get the value. And you get an error - then it's your own fault.
But I know that there may not be enough memory, so I use the "copy" keyword in the query and the PQgetCopyData function. I thought that this function was designed for portioned work. By analogy with the PQputCopyData function, which works fine.

I found a way out. I use "SELECT substring(%s from %d for %d) as chunk from %s.%s WHERE %s = %s" with PQunescapeBytea in a loop. But it looks strange and takes longer.

p .s
I specified that my file is 300+ MB, not 700+.

чт, 16 янв. 2025 г. в 19:03, Tom Lane <tgl@sss.pgh.pa.us>:
PG Bug reporting form <noreply@postgresql.org> writes:
> Hello, I need your help. The essence of the problem is that I am trying to
> download a bytea field from a table row. The size of the data in the field
> is about 700 MB. In response, I receive an out-of-memory error.

I don't see this as a Postgres bug.  If you want to work with values
that large, you'd better have plenty of memory available.

                        regards, tom lane


--
С уважением,
Князев И. Н.       
  
Ilya Knyazev <knuazev@gmail.com> writes:
> But I know that there may not be enough memory, so I use the "copy" keyword
> in the query and the PQgetCopyData function. I thought that this function
> was designed for portioned work. By analogy with the PQputCopyData
> function, which works fine.

Its documentation is fairly clear, I thought:

       Attempts to obtain another row of data from the server during a
       <command>COPY</command>.  Data is always returned one data row at
       a time; if only a partial row is available, it is not returned.

If you need to work with data values that are large enough to risk
memory problems, I think "large objects" are the best answer.  Their
interface is a bit clunky, but it's at least designed to let you
both read and write by chunks.

            regards, tom lane