Re: generic copy options
От | Dan Colish |
---|---|
Тема | Re: generic copy options |
Дата | |
Msg-id | 20090917232119.GF13715@funkstrom.spiretech.com обсуждение исходный текст |
Ответ на | Re: generic copy options (Andrew Dunstan <andrew@dunslane.net>) |
Ответы |
Re: generic copy options
(Emmanuel Cecchet <manu@asterdata.com>)
Re: generic copy options (Andrew Dunstan <andrew@dunslane.net>) |
Список | pgsql-hackers |
On Thu, Sep 17, 2009 at 07:10:35PM -0400, Andrew Dunstan wrote: > > > Greg Smith wrote: >> On Thu, 17 Sep 2009, Dan Colish wrote: >> >>> - Performance appears to be the same although I don't have a good >>> way for >>> testing this at the moment >> >> Here's what I do to generate simple COPY performance test cases: >> >> CREATE TABLE t (i integer); >> INSERT INTO t SELECT x FROM generate_series(1,100000) AS x; >> \timing >> COPY t TO '/some/file' WITH [options]; >> BEGIN; >> TRUNCATE TABLE t; >> COPY t FROM '/some/file' WITH [options]; >> COMMIT; >> >> You can adjust the size of the generated table based on whether you >> want to minimize (small number) or maximize (big number) the impact of >> the setup overhead relative to actual processing time. Big numbers >> make sense if there's a per-row change, small ones if it's mainly COPY >> setup that's been changed if you want a small bit of data to test >> against. >> >> An example with one column in it is a good test case for seeing >> whether per-row impact has gone up. You'd want something with a wider >> row for other types of performance tests. >> >> The reason for the BEGIN/COMMIT there is that form utilizes an >> optimization that lowers WAL volume when doing the COPY insertion, >> which makes it more likely you'll be testing performance of the right >> thing. >> >> > > I usually prefer to test with a table that is more varied than anything > you can make with generate_series. When I tested my ragged copy patch > the other day I copied 1,000,000 rows out of a large table with a > mixture of dates, strings, numbers and nulls. > > But then, it has a (tiny) per field overhead so I wanted to make sure > that was well represented in the test. > > You are certainly right about wrapping it in begin/truncate/commit (and > when you do make sure that archiving is not on). > > You probably want to make sure that the file is not on the same disk as > the database, to avoid disk contention. Or, better, make sure that it is > in OS file system cache, or on a RAM disk. > > cheers > > andrew If someone with a more significant setup can run tests that would ideal. I only have my laptop which is a single disk and fairly underpowered. That said, here are my results running the script above, it looks like the pach improves performance. I would really interested to see results on a larger data set and heavier iron. -- --Dan Without Patch: CREATE TABLEINSERT 0 100000Timing is on.COPY 100000Time: 83.273 msBEGINTime: 0.412 msTRUNCATE TABLETime: 0.357 msCOPY 100000Time:140.911 msCOMMITTime: 4.909 ms With Patch: CREATE TABLEINSERT 0 100000Timing is on.COPY 100000Time: 80.205 msBEGINTime: 0.351 msTRUNCATE TABLETime: 0.346 msCOPY 100000Time:124.303 msCOMMITTime: 4.130 ms
В списке pgsql-hackers по дате отправления: