Re: COPY v. java performance comparison
От | Rob Sargent |
---|---|
Тема | Re: COPY v. java performance comparison |
Дата | |
Msg-id | 533D867D.6020603@gmail.com обсуждение исходный текст |
Ответ на | Re: COPY v. java performance comparison (Thomas Kellerer <spam_eater@gmx.net>) |
Ответы |
Re: COPY v. java performance comparison
|
Список | pgsql-general |
On 04/03/2014 09:01 AM, Thomas Kellerer wrote:
Thomas thanks for these numbers.Rob Sargent, 02.04.2014 21:37:I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+ hours (800+ records/sec). Then I tried COPY and killed that after 11.25 hours when I realised that I had added on non-unque index on the name fields after the first load. By that point is was on line 28301887, so ~0.75 done which implies it would have take ~15hours to complete. Would the overhead of the index likely explain this decrease in throughput? Impatience got the better of me and I killed the second COPY. This time it had done 54% of the file in 6.75 hours, extrapolating to roughly 12 hours to do the whole thing. That matches up with the java speed. Not sure if I should be elated with jOOQ or disappointed with COPY.This is not what I see with COPY FROM STDIN When I load 2million rows using a batch size of 1000 with plain JDBC that takes about 4 minutes Loading the same file through Java and COPY FROM STDIN takes about 4 seconds The table looks like this: Table "public.products" Column | Type | Modifiers -------------------+------------------------+-----------product_id | integer | not nullean_code | bigint | not nullproduct_name | character varying(100) | not nullmanufacturer_name | character varying | not nullprice | numeric(10,2) | not nullpublish_date | date | not null Indexes: "products_pkey" PRIMARY KEY, btree (product_id) "idx_publish_date" btree (publish_date, product_id) During the load both indexes are present. Regards Thomas
I have to straighten out my environment, which I admit I was hoping to avoid. I reset checkpoint_segments to 12 and restarted my server.
I kicked of the COPY at 19:00. That generated a couple of the "too frequent" statements but 52 "WARNING: pgstat wait timeout" lines during the next 8 hours starting at 00:37 (5 hours in) 'til finally keeling over at 03:04 on line 37363768. That's the last line of the input so obviously I didn't flush my last println properly. I'm beyond getting embarrassed at this point.
Is turning auto-vacuum off a reasonable way through this?
В списке pgsql-general по дате отправления: