Re: INSERTing lots of data
От | Dimitri Fontaine |
---|---|
Тема | Re: INSERTing lots of data |
Дата | |
Msg-id | 8739x7m55i.fsf@hi-media-techno.com обсуждение исходный текст |
Ответ на | Re: INSERTing lots of data (Greg Smith <greg@2ndquadrant.com>) |
Список | pgsql-general |
Greg Smith <greg@2ndquadrant.com> writes: > Joachim Worringen wrote: >> my Python application (http://perfbase.tigris.org) repeatedly needs to >> insert lots of data into an exsting, non-empty, potentially large >> table. Currently, the bottleneck is with the Python application, so I >> intend to multi-thread it. Each thread should work on a part of the input >> file. > > You are wandering down a path followed by pgloader at one point: > http://pgloader.projects.postgresql.org/#toc6 and one that I fought with > briefly as well. Simple multi-threading can be of minimal help in scaling > up insert performance here, due to the Python issues involved with the GIL. > Maybe we get Dimitri to chime in here, he did more of this than I did. In my case pgloader is using COPY and not INSERT. Which would mean than while one python thread is blocked on network IO the others have a chance of using the CPU. That should be a case where GIL is working ok. My tests show that it's not. > Two thoughts. First, build a test performance case assuming it will fail to > scale upwards, looking for problems. If you get lucky, great, but don't > assume this will work--it's proven more difficult than is obvious in the > past for others. > > Second, if you do end up being throttled by the GIL, you can probably build > a solution for Python 2.6/3.0 using the multiprocessing module for your use > case: http://docs.python.org/library/multiprocessing.html My plan was to go with http://docs.python.org/library/subprocess.html but it seems multiprocessing is easier to use when you want to port existing threaded code. Thanks Greg! -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
В списке pgsql-general по дате отправления: