Fastest way to insert/update many rows
От | pascal@ensieve.org |
---|---|
Тема | Fastest way to insert/update many rows |
Дата | |
Msg-id | A54AEDB1-20E3-4B18-A91B-F9F3C2A603B7@ensieve.org обсуждение исходный текст |
Ответы |
Re: Fastest way to insert/update many rows
Re: Fastest way to insert/update many rows |
Список | psycopg |
Hi, I'd like to psycopg2 to fetch a large number of rows (hundreds of millions), perform some computations and put them backinto the database. I can fetch about 130k rows/sec with cur.execute('select * from stuff') keyvals = list(cur) and 100k/sec with f = io.StringIO() cur.copy_to(f, 'stuff') f.seek(0) keyvals = list(tuple(map(int, l.split('\t'))) for l in f) but inserting using cur.executemany('insert into stuff values (%s, %s)', keyvals) only has a throughput of 23k/sec with ca. 20% CPU used by Python, 80% by Postgres, while cur.copy_from(io.StringIO('\n'.join('{}\t{}'.format(*r) for r in keyvals)), 'stuff') manages to insert 1.8M/sec. I can't quite believe that generating a string should be the fastest method, am I missing something? What I'd really like to do is cur.executemany('update stuff set value = %s where key = %s', ...) but that was orders of magnitude slower still; probably because the order is random, so it performs an index lookup for eachkey. Populating a temporary table and using 'update stuff ... from temptable ...' is quicker. I have to set one column in each row, is there a way to update cursors like in PL/pgSQL's update <table> set ... where current of <cursor> i.e. iterate through the rows in the most efficient way for the database. Or would it be wiser to use PL/Python for this kind of task instead? -- Pascal Germroth
В списке psycopg по дате отправления: