Re: [psycopg] speed concerns with executemany()
От | Aryeh Leib Taurog |
---|---|
Тема | Re: [psycopg] speed concerns with executemany() |
Дата | |
Msg-id | 20170201133219.GA2766@deb76.aryehleib.com обсуждение исходный текст |
Ответ на | Re: [psycopg] speed concerns with executemany() (Daniele Varrazzo <daniele.varrazzo@gmail.com>) |
Список | psycopg |
On Wed, Feb 01, 2017 at 02:21:37AM +0000, Daniele Varrazzo wrote: > On Mon, Jan 30, 2017 at 9:51 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote: > > I haven't dug into PQexecParams, but it seems to me that these > > alternate strategies could be made available outside of psycopg2. > > I got to the same conclusion, so I've implemented functions to > implement the functions execute_batch() and execute_values() (resp. > what you call "joined" and "folded" in your benchmarks) in the extras > module, and leave the semantics of executemany() untouched (no extra > parameters or rowcount breaking only if going batched... just too > confusing). > > Implementation, docs, tests in `this commit`__. > > .. __: https://github.com/psycopg/psycopg2/commit/a95fd3df1abc0282f1c47fa2170191f037c3c8de > > I also thought about implementing an execute_prepared() function, > which would have run PREPARE, then EXECUTE in a loop (possibly batched > as in execute_batch), finally DEALLOCATE. Implementation doesn't seem > trivial because: > > - parameters %s and %(name)s should be replaced with $1, $2, ..., with > matching unescaping of literal % and escaping of literal $; > - in case of error, calling DEALLOCATE is tricky: if the connection is > not autocommit the transaction is failed, the function could use a > savepoint but then it would clear the error state too; not calling > DEALLOCATE would leave the prepared statement there and according to > the name chosen for the statement (e.g. 'psycopg_%s' % id(cursor)) > would make another execute_prepared() fail... it should be possible to > wrap the PREPARE in a savepoint to deal with this problem; > - if the connection is autocommit all the above is not needed. > > ISTM that it's easier to leave the users to call > PREPARE/execute_batch('EXECUTE')/DEALLOCATE: knowing the context in > which the sequence is called wouldn't require parameters mangling and > the error management would be simpler for them. > > Thoughts? Shall we merge this stuff? Any feedback is welcome. I think it looks great. I agree that execute_prepared() would be tricky, and it doesn't seem to provide much performance benefit over the methods you've already implemented. I have run the following benchmarks for UPDATE across the Atlantic: 1. classic executemany 2. "joined" - statements executed in batches 3. "prepared/joined" - as you suggested above 4. pgcopy - copy into a temp table, then UPDATE FROM temp table. <https://gist.github.com/altaurog/84668e034646fb354b5de81bb86a580d#file-updatemany-py-L69> With NRECS = 5000 classic: 794.658465862 sec joined: 11.6829760075 sec prepared: 10.1489500999 sec pgcopy: 2.68695497513 sec
В списке psycopg по дате отправления: