Обсуждение: execute many for each commit

Поиск
Список
Период
Сортировка

execute many for each commit

От
Alessandro Gagliardi
Дата:
This is really more of a psycopg2 than a PostgreSQL question per se, but hopefully there are a few Pythonistas on this list who can help me out. At a recent PUG meeting I was admonished on the folly of committing after every execute statement (especially when I'm executing hundreds of inserts per second). I was thinking of batching a bunch of execute statements (say, 1000) before running a commit but the problem is that if any one of those inserts fail (say, because of a unique_violation, which happens quite frequently) then I have to rollback the whole batch. Then I'd have to come up with some logic to retry each one individually or something similarly complicated.

I look at this problem and I think, "I must be doing something wrong." But I can't figure out what it is. The closest thing to an answer I could find using Google was http://stackoverflow.com/questions/396455/python-postgresql-psycopg2-interface-executemany which didn't really provide any good solution at all. Perhaps someone here knows better? Or perhaps that advice was wrong and I simply do have to do a commit after each insert?

Re: execute many for each commit

От
Tom Lane
Дата:
Alessandro Gagliardi <alessandro@path.com> writes:
> This is really more of a psycopg2 than a PostgreSQL question per se, but
> hopefully there are a few Pythonistas on this list who can help me out. At
> a recent PUG meeting I was admonished on the folly of committing after
> every execute statement (especially when I'm executing hundreds of inserts
> per second). I was thinking of batching a bunch of execute statements (say,
> 1000) before running a commit but the problem is that if any one of those
> inserts fail (say, because of a unique_violation, which happens quite
> frequently) then I have to rollback the whole batch. Then I'd have to come
> up with some logic to retry each one individually or something similarly
> complicated.

Subtransactions (savepoints) are considerably cheaper than full
transactions.  Alternatively you could consider turning off
synchronous_commit, if you don't need a guarantee that COMMIT means "it's
already safely on disk".

            regards, tom lane