Обсуждение: Inserting 'large' amounts of data

Поиск
Список
Период
Сортировка

Inserting 'large' amounts of data

От
Mario Splivalo
Дата:
I have a web application which allows users to upload a lot of phone
numbers. I need to store those numbers to a database. Usualy, one would
upload around 70k-100k of records, totaling around 2 MB in size.

I'm using tomcat as an application server, and JDBC to connect to pg8.3
database.

I will have around 20-50 concurent users in peek hours, and even that is
quite overestimated.

I could create the temporary file on the filesystem where database
cluster is located and then execute COPY mytable FROM
'/tmp/upload-data/uuidofsomesort.csv' WITH CSV', but the 'problem' is
that database server and tomcat reside on different physical machines.

What would one recommend as the best way to insert those data?

    Mario

Re: Inserting 'large' amounts of data

От
John R Pierce
Дата:
Mario Splivalo wrote:
> I have a web application which allows users to upload a lot of phone
> numbers. I need to store those numbers to a database. Usualy, one would
> upload around 70k-100k of records, totaling around 2 MB in size.
>
> ...
> t would one recommend as the best way to insert those data?
>


I believe you can use org.postgresql.copy.CopyIn() ...  there are
variants that use a writeToCopy() call to send the data, or a
java.io.InputStream, or a java.io.Reader ...



Re: Inserting 'large' amounts of data

От
Maciek Sakrejda
Дата:
Right. As of the 8.4 release, you have a jdbc API to the postgresql
COPY functionality. Cast your Connection to a PGConnection and call
.getCopyAPI(). You can then use the CopyManager[1] to copy data in
from your tomcat servlet.

[1] http://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html

--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: Inserting 'large' amounts of data

От
dmp
Дата:
>
>
>I have a web application which allows users to upload a lot of phone
>numbers. I need to store those numbers to a database. Usualy, one would
>upload around 70k-100k of records, totaling around 2 MB in size.
>
>I'm using tomcat as an application server, and JDBC to connect to pg8.3
>database.
>
>I will have around 20-50 concurent users in peek hours, and even that is
>quite overestimated.
>
>I could create the temporary file on the filesystem where database
>cluster is located and then execute COPY mytable FROM
>'/tmp/upload-data/uuidofsomesort.csv' WITH CSV', but the 'problem' is
>that database server and tomcat reside on different physical machines.
>
>What would one recommend as the best way to insert those data?
>
>    Mario
>
Hello Mario,
If the users already have the data in CSV format why not let them do it
via the
app. server? The connection can be made across machines if setup properly.

http://dandymadeproductions.com/projects/MyJSQLView/docs/javadocs/index.html
CSVDataImportThread.java

This class could be used as a basis, with some work. I have a bug with
data that has
semicolons, could be more robust also, but could be used for a start.

John wrote:

> I believe you can use org.postgresql.copy.CopyIn() ...  there are
> variants that use a writeToCopy() call to send the data, or a
> java.io.InputStream, or a java.io.Reader ...

This sounds a lot cleaner.

danap.

Re: Inserting 'large' amounts of data

От
Mario Splivalo
Дата:
John R Pierce wrote:
> Mario Splivalo wrote:
>> I have a web application which allows users to upload a lot of phone
>> numbers. I need to store those numbers to a database. Usualy, one would
>> upload around 70k-100k of records, totaling around 2 MB in size.
>>
>> ...
>> t would one recommend as the best way to insert those data?
>
> I believe you can use org.postgresql.copy.CopyIn() ...  there are
> variants that use a writeToCopy() call to send the data, or a
> java.io.InputStream, or a java.io.Reader ...

Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but
somehow I'd like to wait for a month or so before making the switch.

Thank you all.


    Mike

Re: Inserting 'large' amounts of data

От
Kris Jurka
Дата:

On Thu, 27 Aug 2009, Mario Splivalo wrote:

> Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but somehow
> I'd like to wait for a month or so before making the switch.
>

You only need the 8.4 JDBC driver, not a 8.4 server.  Copy support should
work for all 7.4 and later servers.

Kris Jurka

Re: Inserting 'large' amounts of data

От
John R Pierce
Дата:
Kris Jurka wrote:
>
>
> On Thu, 27 Aug 2009, Mario Splivalo wrote:
>
>> Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but
>> somehow I'd like to wait for a month or so before making the switch.
>>
>
> You only need the 8.4 JDBC driver, not a 8.4 server.  Copy support
> should work for all 7.4 and later servers.

speaking of using JDBC COPY FROM STDIN via this CopyManager
interface...  how does that handle constraint violations, like duplicate
primary key?



Re: Inserting 'large' amounts of data

От
Maciek Sakrejda
Дата:
In the same manner as regular DML--you'll get a constraint violation:

cqdb=# create table foo(a int unique);
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "foo_a_key"
for table "foo"
CREATE TABLE
cqdb=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 1
>> 1
>> \.
ERROR:  duplicate key value violates unique constraint "foo_a_key"
CONTEXT:  COPY foo, line 2: "1"

(This is straight through psql, but essentially the same thing would
happen through jdbc).

--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: Inserting 'large' amounts of data

От
Kris Jurka
Дата:

John R Pierce wrote:
>
> speaking of using JDBC COPY FROM STDIN via this CopyManager
> interface...  how does that handle constraint violations, like duplicate
> primary key?
>

Just like a regular copy failure via psql or constraint violation.  An
Exception is thrown and the transaction is aborted (which you must
rollback).

Kris Jurka

Re: Inserting 'large' amounts of data

От
Mario Splivalo
Дата:
Kris Jurka wrote:
>
>
> On Thu, 27 Aug 2009, Mario Splivalo wrote:
>
>> Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but
>> somehow I'd like to wait for a month or so before making the switch.
>>
>
> You only need the 8.4 JDBC driver, not a 8.4 server.  Copy support
> should work for all 7.4 and later servers.

Works like a charm! :)

    Mike