ResultSet construction and byte arrays
От | Jan Oksfeldt Jonasen |
---|---|
Тема | ResultSet construction and byte arrays |
Дата | |
Msg-id | 5.2.1.1.0.20040206112237.02b99a60@mail.northmann.com обсуждение исходный текст |
Ответы |
Re: ResultSet construction and byte arrays
Re: ResultSet construction and byte arrays |
Список | pgsql-jdbc |
I can see from the archives this has been up more than once, but I'd still like to take a stab at it. Construction of ResultSets based on searches with many results or based on large tables does have some issues, the most noticable one being loading everything found into memory as byte arrays. I'm currently using PostgreSQL in the version created by dbExperts (7.3.2) with a win2k database server and a Java client application. I experienced some serious performance issues when using these searches with somewhat large result sets, so decided to use a profiler and the latest driver source from CVS. The testing showed there was two major points: Garbage collection and data loading from the server. Looking at the source it turns out the problems come from the PGStream class methods: public byte[][] ReceiveTupleV2(int nf, boolean bin) and public byte[] Receive(int siz) First one creating two dimentional byte arrays, second one dimentional. Seemingly a huge amount is created, in a test on a table with 67 columns and ~15000 tuples nearly 2 million byte arrays are created (operation takes 20+ seconds), all these are put into a Vector which in turn ends up in a ResultSet implementation I believe to be a AbstractJdbc1ResultSet. This is a lot of data to keep in memory. Since it's possible to have multiple search results open (eg. swing client view) these arrays does become a major problem until ResultSet.close() is called which nulls out the vector. Once this happens the garbage collector kicks in and again lowers performance, albeit for a short while. Based on the source I've tried a few things to better this, for instance adding a byte array pool for both the one and two dimentionals, but while it certainly speeds up the loading time with reuse over the new operator and less GC action, it doesn't help on memory consumption which eventually crashes the VM. Are there any ideas on how to fix this? Are there plans to look into it? On Oracle (same client, a tad different code) I had a major problem using ResultSet.absolute() since Oracle only have forward cursors natively, the cursor would behind the scene be rolled forward, flipping over and forward if you where to go backwards with absolute(). For instance, being at row 10.000 and going back to 9.999 would have the cursor scroll forward, flip and forward. This created a huge amount of temporary object creation, besides this operation just being painfully slow. To fix this I created wrappers for multiple java.sql interfaces, this way making sure to have full control over which ResultSet implementation was being used. The idea was to utillize an expression similiar to (taken from an article at OTN): from ( select name.*, rownum rnum from ( <everything in a query, including order by clause> ) name where rownum <= MAX_ROWS ) where rnum >= MIN_ROWS With the wrappers I could make sure only a certain number of rows where found, by maintaining some state on current min and max rows. The downside to this solution is additional queries, but still a lot faster than the cursor movement. If it's possible to make an expression like the above in Postgres, I think that could also be the solution to not loading everything in a result set. The result set should be enabled to re-query the database once some row above its max was requested. So, how would I make such an expression? As a last alternative, are there any commercial drivers available which aren't just wrappers for the standard one? The ones I can search out at Sun just seems to be. Best regards Jan Oksfeldt Jonasen
В списке pgsql-jdbc по дате отправления: