Обсуждение: JDBC driver's (non-)handling of InputStream:s

Поиск
Список
Период
Сортировка

JDBC driver's (non-)handling of InputStream:s

От
Peter Schuller
Дата:
Hello,

Tonight I rewrote a part of an application that deals with http uploads,
because it turned out it has to handle larger files than originally intended
- and one was getting OutOfMemory errors.

So I rewrote evcerything so that an InputStream is passed to the JDBC driver
and the files are never completely loaded into memory. However I am still
getting an OutOfMemory error for large files. While it is difficult to
pinpoint exactly where due to the lack of a stack trace, it does look like
the driver is causing it.

Does the JDBC driver handle InputStream:s intelligently at all? If so, does it
do so under all circumstances? In this case I am putting data into a column
of type 'bytea' and am using PreparedStatement.setBinaryStream().

The backend is PostgreSQL 7.4.1, and I am using the driver for 7.4.1
(pg74.1jdbc3.jar). Running under JDK 1.4.2.

Do I need to use some other type in the database in order for input streams to
be handled properly? Do I have to use some PostgreSQL specific API? Does the
JDBC driver need to be changed to support this?

I can always fall back to using files on the filesystem, but then I will loose
all the niceties that come with ACID transactions which I automatically get
if I keep it all in the database.

Thanks!

--
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org

Re: JDBC driver's (non-)handling of InputStream:s

От
Dave Cramer
Дата:
Peter,

It would be great if you could supply a test case which exhibits this
behaviour.

Dave
On Mon, 2004-03-29 at 20:10, Peter Schuller wrote:
> Hello,
>
> Tonight I rewrote a part of an application that deals with http uploads,
> because it turned out it has to handle larger files than originally intended
> - and one was getting OutOfMemory errors.
>
> So I rewrote evcerything so that an InputStream is passed to the JDBC driver
> and the files are never completely loaded into memory. However I am still
> getting an OutOfMemory error for large files. While it is difficult to
> pinpoint exactly where due to the lack of a stack trace, it does look like
> the driver is causing it.
>
> Does the JDBC driver handle InputStream:s intelligently at all? If so, does it
> do so under all circumstances? In this case I am putting data into a column
> of type 'bytea' and am using PreparedStatement.setBinaryStream().
>
> The backend is PostgreSQL 7.4.1, and I am using the driver for 7.4.1
> (pg74.1jdbc3.jar). Running under JDK 1.4.2.
>
> Do I need to use some other type in the database in order for input streams to
> be handled properly? Do I have to use some PostgreSQL specific API? Does the
> JDBC driver need to be changed to support this?
>
> I can always fall back to using files on the filesystem, but then I will loose
> all the niceties that come with ACID transactions which I automatically get
> if I keep it all in the database.
>
> Thanks!
--
Dave Cramer
519 939 0336
ICQ # 14675561


Re: JDBC driver's (non-)handling of InputStream:s

От
Oliver Jowett
Дата:
Peter Schuller wrote:

> Does the JDBC driver handle InputStream:s intelligently at all? If so, does it
> do so under all circumstances? In this case I am putting data into a column
> of type 'bytea' and am using PreparedStatement.setBinaryStream().

The short answer is no, it's not smart about InputStreams. It treats the
stream essentially the same as if you'd read the entire stream into a
byte array then called setBytes().

> The backend is PostgreSQL 7.4.1, and I am using the driver for 7.4.1
> (pg74.1jdbc3.jar). Running under JDK 1.4.2.
>
> Do I need to use some other type in the database in order for input streams to
> be handled properly? Do I have to use some PostgreSQL specific API? Does the
> JDBC driver need to be changed to support this?

It'll need non-trivial driver modifications. It's on my list of things
to do, but I doubt I'm going to get to any of the postgresql work on
that list for quite some time now :(

The "right way" to do it is to expand the driver's use of the V3
protocol to use the extended query protocol; then the stream can be
directly streamed to the backend without further
translation/escaping/etc using a binary Bind parameter. But there's some
infrastructure work to do before that can happen.

I think there was a patch floating around on the list some time ago that
got a similar effect by using the LO interface to stream the data to a
temporary LO, then inserted into the bytea from the LO. I can't find it
right now though; the archives.postgresql.org search page seems to be
broken again (it's returning 0 results for many searches -- e.g.
searching for "inputstream" in pgsql-jdbc in the last year returns 0
results which is obviously wrong).

You could use LOs instead of bytea if you have some flexibility in your
schema; the LO interface should stream nicely. It's nastier to deal with
LOs on the JDBC side, though: you'll need to drop the driver into a
previous-version-compatibility mode (7.1?) or use the LO manager
directly. Also LOs don't play well with autocommit.

-O

Re: JDBC driver's (non-)handling of InputStream:s

От
Peter Schuller
Дата:
Hello,

> It would be great if you could supply a test case which exhibits this
> behaviour.

I can probably do that relatively easily (though of course subject to the
behavior of the VM memory manager), though given Oliver Jowett's response it
seems pretty obvious what is happening - allocating a 5 MB byte array is
failing and/or the byte array is constructed in a tight loop allocating
incrementally larger arrays until the data fits.

Allocating such large arrays is bound to be unreliable in any case, so as long
as that is what's happening I fully understand it and there's probably not
much to be done...

--
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Re: JDBC driver's (non-)handling of InputStream:s

От
Peter Schuller
Дата:
Hello,

> The short answer is no, it's not smart about InputStreams. It treats the
> stream essentially the same as if you'd read the entire stream into a
> byte array then called setBytes().

Ok. That's what I suspected. Thanks!

[ lots of interesting stuff snipped]

> You could use LOs instead of bytea if you have some flexibility in your
> schema; the LO interface should stream nicely. It's nastier to deal with
> LOs on the JDBC side, though: you'll need to drop the driver into a
> previous-version-compatibility mode (7.1?) or use the LO manager
> directly. Also LOs don't play well with autocommit.

Hmm, okay. So in versions greater than 7.1, I can not simply use a normal
PreparedStatement and setBinaryStream(), as suggested in the documentation?
Is this for some technical reason or is this method now deprecated?

I am only getting timeouts on postgresql.org at the moment, so perhaps this is
explicitly stated in the docs, but can I safely assume that any usage of the
Large Object Manager as part of a transactions whose queries are otherwise
done through normal JDBC channels, will not cause any interference between
the JDBC API and the LO manager? I.e., is it "safe" to intermix usage of
large objects with normal statements through JDBC?

--
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Re: JDBC driver's (non-)handling of InputStream:s

От
Kris Jurka
Дата:

On Tue, 30 Mar 2004, Oliver Jowett wrote:

> The "right way" to do it is to expand the driver's use of the V3
> protocol to use the extended query protocol; then the stream can be
> directly streamed to the backend without further
> translation/escaping/etc using a binary Bind parameter. But there's some
> infrastructure work to do before that can happen.
>

Can we really stream an InputStream directly to the backend?  Don't we
need to prefix the message with a length argument?  and the only way to
get this with the InputStream API is to read the whole thing.  Yes, we can
avoid the translation/escaping, but I don't think we can avoid persisting
the InputStream at some point.

Kris Jurka

Re: JDBC driver's (non-)handling of InputStream:s

От
Oliver Jowett
Дата:
Kris Jurka wrote:
>
> On Tue, 30 Mar 2004, Oliver Jowett wrote:
>
>
>>The "right way" to do it is to expand the driver's use of the V3
>>protocol to use the extended query protocol; then the stream can be
>>directly streamed to the backend without further
>>translation/escaping/etc using a binary Bind parameter. But there's some
>>infrastructure work to do before that can happen.
>>
>
>
> Can we really stream an InputStream directly to the backend?  Don't we
> need to prefix the message with a length argument?  and the only way to
> get this with the InputStream API is to read the whole thing.  Yes, we can
> avoid the translation/escaping, but I don't think we can avoid persisting
> the InputStream at some point.

setBinaryStream() provides a length in addition to the stream. If the
length is correct, we can calculate the query size ahead of time without
reading the input stream. The spec is not entirely clear as to which is
the length to insert; it implies it's an error to have a stream that
doesn't match the provided length, but it's not explicit about what
happens in that case.

If the length doesn't match the stream, we can deal with it: we can
truncate to the specified length easily (emit a SQLWarning perhaps?),
and an unexpected EOF can be treated like an IOException.

-O

Re: JDBC driver's (non-)handling of InputStream:s

От
Oliver Jowett
Дата:
Peter Schuller wrote:

>>You could use LOs instead of bytea if you have some flexibility in your
>>schema; the LO interface should stream nicely. It's nastier to deal with
>>LOs on the JDBC side, though: you'll need to drop the driver into a
>>previous-version-compatibility mode (7.1?) or use the LO manager
>>directly. Also LOs don't play well with autocommit.
>
>
> Hmm, okay. So in versions greater than 7.1, I can not simply use a normal
> PreparedStatement and setBinaryStream(), as suggested in the documentation?
> Is this for some technical reason or is this method now deprecated?

It's a technical reason. Some background..

In 7.1, calling setBinaryStream()/setBytes()/etc (and the equivalent
ResultSet methods) would assume the field contained a Large Object OID;
the driver would talk to the LO API to handle retrieving/storing the
actual LO data as needed. This worked but had some odd quirks, since a
field containing a LO OID is not the same as a field containing the data
itself -- you had to be careful with transaction demarcation etc.

In 7.2+, calling those methods assumed the field was a bytea and the
data could be directly stored/retrieved as part of the query without a
separate operation via the LO API. 'bytea' is a much better match to
JDBC's LONGVARBINARY, so (at least to me) this change makes sense.

Since the driver doesn't know much about the context of a parameter in a
query, and has to pick one approach or the other at the point where
setBinaryStream() is called, the only way to switch behaviours was to
globally change the driver's behaviour by telling it "behave like you're
a 7.1 driver" (a 'compatible=7.1' URL parameter).

> I am only getting timeouts on postgresql.org at the moment, so perhaps this is
> explicitly stated in the docs, but can I safely assume that any usage of the
> Large Object Manager as part of a transactions whose queries are otherwise
> done through normal JDBC channels, will not cause any interference between
> the JDBC API and the LO manager? I.e., is it "safe" to intermix usage of
> large objects with normal statements through JDBC?

Yes, it's safe.

-O

Re: JDBC driver's (non-)handling of InputStream:s

От
Guido Fiala
Дата:
>> It would be great if you could supply a test case which exhibits this
>> behaviour.
>
>I can probably do that relatively easily (though of course subject to the
>behavior of the VM memory manager), though given Oliver Jowett's response
it
>seems pretty obvious what is happening - allocating a 5 MB byte array is
>failing and/or the byte array is constructed in a tight loop allocating
>incrementally larger arrays until the data fits.

It's slightly more "intelligent" - see code...

>Allocating such large arrays is bound to be unreliable in any case, so as
long
>as that is what's happening I fully understand it and there's probably not
>much to be done...

Thats true so long the consumer needs the data fully in RAM, e.g. an
Image-Label or similar. If the consumer is also stream-based it could be
avoided. Unfortunately the backend would still hold the results of the query
fully in memory which is also not always desired...

Guido