Re: generic copy options

Поиск
Список
Период
Сортировка
От Dan Colish
Тема Re: generic copy options
Дата
Msg-id 20090917232119.GF13715@funkstrom.spiretech.com
обсуждение исходный текст
Ответ на Re: generic copy options  (Andrew Dunstan <andrew@dunslane.net>)
Ответы Re: generic copy options  (Emmanuel Cecchet <manu@asterdata.com>)
Re: generic copy options  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
On Thu, Sep 17, 2009 at 07:10:35PM -0400, Andrew Dunstan wrote:
>
>
> Greg Smith wrote:
>> On Thu, 17 Sep 2009, Dan Colish wrote:
>>
>>>     - Performance appears to be the same although I don't have a good 
>>> way for
>>>       testing this at the moment
>>
>> Here's what I do to generate simple COPY performance test cases:
>>
>> CREATE TABLE t (i integer);
>> INSERT INTO t SELECT x FROM generate_series(1,100000) AS x;
>> \timing
>> COPY t TO '/some/file' WITH [options];
>> BEGIN;
>> TRUNCATE TABLE t;
>> COPY t FROM '/some/file' WITH [options];
>> COMMIT;
>>
>> You can adjust the size of the generated table based on whether you  
>> want to minimize (small number) or maximize (big number) the impact of  
>> the setup overhead relative to actual processing time.  Big numbers  
>> make sense if there's a per-row change, small ones if it's mainly COPY  
>> setup that's been changed if you want a small bit of data to test  
>> against.
>>
>> An example with one column in it is a good test case for seeing  
>> whether per-row impact has gone up.  You'd want something with a wider  
>> row for other types of performance tests.
>>
>> The reason for the BEGIN/COMMIT there is that form utilizes an  
>> optimization that lowers WAL volume when doing the COPY insertion,  
>> which makes it more likely you'll be testing performance of the right  
>> thing.
>>
>>
>
> I usually prefer to test with a table that is more varied than anything  
> you can make with generate_series. When I tested my ragged copy patch  
> the other day I copied 1,000,000 rows out of a large table with a  
> mixture of dates, strings, numbers and nulls.
>
> But then, it has a (tiny) per field overhead so I wanted to make sure  
> that was well represented in the test.
>
> You are certainly right about wrapping it in begin/truncate/commit (and  
> when you do make sure that archiving is not on).
>
> You probably want to make sure that the file is not on the same disk as  
> the database, to avoid disk contention. Or, better, make sure that it is  
> in OS file system cache, or on a RAM disk.
>
> cheers
>
> andrew

If someone with a more significant setup can run tests that would ideal.
I only have my laptop which is a single disk and fairly underpowered.

That said, here are my results running the script above, it looks like
the pach improves performance. I would really interested to see results
on a larger data set and heavier iron.

--
--Dan

Without Patch:
CREATE TABLEINSERT 0 100000Timing is on.COPY 100000Time: 83.273 msBEGINTime: 0.412 msTRUNCATE TABLETime: 0.357 msCOPY
100000Time:140.911 msCOMMITTime: 4.909 ms
 

With Patch:
CREATE TABLEINSERT 0 100000Timing is on.COPY 100000Time: 80.205 msBEGINTime: 0.351 msTRUNCATE TABLETime: 0.346 msCOPY
100000Time:124.303 msCOMMITTime: 4.130 ms
 





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: generic copy options
Следующее
От: Emmanuel Cecchet
Дата:
Сообщение: Re: generic copy options