Re: COPY FROM performance improvements
От | Tom Lane |
---|---|
Тема | Re: COPY FROM performance improvements |
Дата | |
Msg-id | 29007.1123387696@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: COPY FROM performance improvements ("Luke Lonergan" <llonergan@greenplum.com>) |
Ответы |
Re: COPY FROM performance improvements
Re: COPY FROM performance improvements |
Список | pgsql-patches |
"Luke Lonergan" <llonergan@greenplum.com> writes: >> I had some difficulty in generating test cases that weren't largely >> I/O-bound, but AFAICT the patch as applied is about the same speed >> as what you submitted. > You achieve the important objective of knocking the parsing stage down a > lot, but your parsing code is actually about 20% slower than Alon's. I would like to see the exact test case you are using to make this claim; the tests I did suggested my code is the same speed or faster. The particular test case I was using was the "tenk1" data from the regression database, duplicated out to about 600K rows so as to run long enough to measure with some degree of repeatability. As best I can tell, my version of CopyReadAttributes is significantly quicker than Alon's, approximately balancing out the fact that my version of CopyReadLine is slower. I did the latter first, and would now be tempted to rewrite it in the same style as CopyReadAttributes, ie one pass of memory-to-memory copy using pointers rather than buffer indexes. BTW, late today I figured out a way to get fairly reproducible non-I/O-bound numbers about COPY FROM: use a trigger that suppresses the actual inserts, thus: create table foo ... create function noway() returns trigger as 'begin return null; end' language plpgsql; create trigger noway before insert on foo for each row execute procedure noway(); then repeat: copy foo from '/tmp/foo.data'; If the source file is not too large to fit in kernel disk cache, then after the first iteration there is no I/O at all. I got numbers that were reproducible within less than 1%, as opposed to 5% or more variation when the thing was partially I/O bound. Pretty useless in the real world, of course, but great for timing COPY's data-pushing. regards, tom lane
В списке pgsql-patches по дате отправления: