Обсуждение: connection lost with concurrent transactions

Поиск
Список
Период
Сортировка

connection lost with concurrent transactions

От
Otto Vazquez
Дата:
Hi all,

We are using django (1.3) with django-celery (2.2.4), database in postgres (9.0.1) with psycopg2 (2.2.2) connector for a large project (also large company).

Executing celery tasks one by one works fine. When requested, django inserts a new row in db (django starts a transaction) and two task and are invoked and both tasks write in the same row but in different fields. With CELERY_ALWAYS_EAGER set, the tasks execute sequentially.

The problem comes when switching celery from synchronous to asynchronous. Executing the same code, the celery daemon raises this exception:

[2011-05-13 15:37:51,981: WARNING/PoolWorker-1] /prj/env/lib/python2.6/site-packages/celery/worker/job.py:114: UserWarning: Exception outside body: <class 'psycopg2.OperationalError'>: no connection to the server
[2011-05-13 15:37:51,982: ERROR/MainProcess] Task cds.tasks.resize[2f198cdd-a1d5-4e61-b4cd-a69a8e6586f4] raised exception: OperationalError('no connection to the server\n',)
Traceback (most recent call last):
  File "/prj/env/lib/python2.6/site-packages/celery/worker/job.py", line 108, in execute_safe
    return self.execute(*args, **kwargs)
  File "/prj/env/lib/python2.6/site-packages/celery/worker/job.py", line 126, in execute
    return super(WorkerTaskTrace, self).execute()
  File "/prj/env/lib/python2.6/site-packages/celery/execute/trace.py", line 76, in execute
    retval = self._trace()
  File "/prj/env/lib/python2.6/site-packages/celery/execute/trace.py", line 92, in _trace
    return handler(trace.retval, trace.exc_type, trace.tb, trace.strtb)
  File "/prj/env/lib/python2.6/site-packages/celery/worker/job.py", line 147, in handle_failure
    exc = self.task.backend.mark_as_failure(self.task_id, exc, strtb)
  File "/prj/env/lib/python2.6/site-packages/celery/backends/base.py", line 45, in mark_as_failure
    traceback=traceback)
  File "/prj/env/lib/python2.6/site-packages/celery/backends/base.py", line 157, in store_result
    return self._store_result(task_id, result, status, traceback, **kwargs)
  File "/prj/env/lib/python2.6/site-packages/djcelery/backends/database.py", line 20, in _store_result
    traceback=traceback)
  File "/prj/env/lib/python2.6/site-packages/djcelery/managers.py", line 46, in _inner
    transaction.rollback_unless_managed()
  File "/prj/env/lib/python2.6/site-packages/django/db/transaction.py", line 133, in rollback_unless_managed
    connection.rollback_unless_managed()
  File "/prj/env/lib/python2.6/site-packages/django/db/backends/__init__.py", line 193, in rollback_unless_managed
    self._rollback()
  File "/prj/env/lib/python2.6/site-packages/django/db/backends/__init__.py", line 50, in _rollback
    return self.connection.rollback()
OperationalError: no connection to the server

For testing and isolate the problem, we tried with MySQL (5.1.49) and mysql-django (1.2.3). Just changed db connector and port, and worked like charm (no fails over ~400 tasks).
We also have tried modifying celery parameters (CELERY_DB_REUSE_MAX, CELERY_TASK_PUBLISH_RETRY mainly) with some code tweaks.
After a couple of hours, we found the best fix-approach, wich has minimized the number of exception, but still fails.

Even with the auto commit enabled. Anyway, we don't want to change django code.

We believe it's a connector problem, just google a little and you will find lots a posts with same/similar problem.
Some other useful info/samples:

Can you give any light on this? If we don't find any solution, we'll move to mysql, which by now seems to be the best (extreme as well) option.

Otto Vazquez



Re: connection lost with concurrent transactions

От
Daniele Varrazzo
Дата:
On Fri, May 13, 2011 at 5:12 PM, Otto Vazquez <otto.vazquez@gmail.com> wrote:

> OperationalError: no connection to the server

> We believe it's a connector problem, just google a little and you will find
> lots a posts with same/similar problem.
> Some other useful info/samples:
>
http://stackoverflow.com/questions/1303654/threaded-django-task-doesnt-automatically-handle-transactions-or-db-connections
> http://groups.google.com/group/django-developers/browse_frm/thread/5249b9ba993431ca/4d1b9d65329c8b75
> http://code.djangoproject.com/ticket/9964

Actually it doesn't seem the same issue to me. IIRC with the django
issue you get long running transactions. A possible consequence may be
getting errors like "current transaction is aborted...". But "no
connection to the server" is an error message I have never seen. I
wouldn't even know how to reproduce it just using psycopg: if you
issue a rollback() on a closed connection you don't get that error,
but rather a clean "InterfaceError: connection already closed".

Do you have any middleware software (pgpool etc.) handling the
connection used by psycopg? Anything interfering with the socket?

-- Daniele

Re: connection lost with concurrent transactions

От
maplabs@light42.com
Дата:
I am not an expert in django, but I can say anecdotally that my
colleague did not have good luck with Celery

==
Brian Hamlin
planetwork.net
OSGeo California Chapter
(415) 717-4462 cell

On Fri, 13 May 2011 17:46:50 +0100, Daniele Varrazzo  wrote:
On Fri, May 13, 2011 at 5:12 PM, Otto Vazquez <otto.vazquez@gmail.com> wrote:
>
> > OperationalError: no connection to the server
>
> > We believe it's a connector problem, just google a little and you will find
> > lots a posts with same/similar problem.
> > Some other useful info/samples:
> >
>
http://stackoverflow.com/questions/1303654/threaded-django-task-doesnt-automatically-handle-transactions-or-db-connections
> >
> http://groups.google.com/group/django-developers/browse_frm/thread/5249b9ba993431ca/4d1b9d65329c8b75
> > http://code.djangoproject.com/ticket/9964
>
> Actually it doesn't seem the same issue to me. IIRC with the django
> issue you get long running transactions. A possible consequence may be
> getting errors like "current transaction is aborted...". But "no
> connection to the server" is an error message I have never seen. I
> wouldn't even know how to reproduce it just using psycopg: if you
> issue a rollback() on a closed connection you don't get that error,
> but rather a clean "InterfaceError: connection already closed".
>
> Do you have any middleware software (pgpool etc.) handling the
> connection used by psycopg? Anything interfering with the socket?
>
> -- Daniele
>
> -- Sent via psycopg mailing list (psycopg@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/psycopg
>
>



Re: connection lost with concurrent transactions

От
Otto Vazquez
Дата:
I just installed RabbitMQ in my local machine (ubuntu 10.10, postgres 8.4.8 and rabbitmq 1.8.0, other stuff is the same version) and everything worked fine.
So I tried to do the same with the dev environment (centos 5.0 final, rabbitmq 1.7.2 from epel repo). We have 6 machines: 2 cds (where tasks are executed), 2 cms and 2 db (master/slave, so only master accessible). 

I have tried different RabbitMQ configuration: just in db master host, in both cds hosts, in all hosts... no way. Always getting same error:
[2011-05-16 14:20:12,084: WARNING/PoolWorker-1] /usr/lib/python2.6/site-packages/celery-2.2.4-py2.6.egg/celery/worker/job.py:114: UserWarning: Exception outside body: <class 'psycopg2.InterfaceError'>: connection already closed
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/celery-2.2.4-py2.6.egg/celery/worker/job.py", line 108, in execute_safe
    return self.execute(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/celery-2.2.4-py2.6.egg/celery/worker/job.py", line 129, in execute
    self.loader.on_process_cleanup()
  File "/usr/lib/python2.6/site-packages/django_celery-2.2.4-py2.6.egg/djcelery/loaders.py", line 67, in on_process_cleanup
    self.close_database()
  File "/usr/lib/python2.6/site-packages/django_celery-2.2.4-py2.6.egg/djcelery/loaders.py", line 47, in close_database
    return django.db.close_connection()
  File "/usr/lib/python2.6/site-packages/django/db/__init__.py", line 85, in close_connection
    conn.close()
  File "/usr/lib/python2.6/site-packages/django/db/backends/__init__.py", line 244, in close
    self.connection.close()
InterfaceError: connection already closed
None

So now, I'm not sure if this is a matter of architecture, version bug or the connector is not working properly. 
BTW, we are not using any db middleware (pgpool or pgbouncer)

Any hint before moving to MySQL?

Otto.

On Fri, May 13, 2011 at 6:49 PM, <maplabs@light42.com> wrote:
I am not an expert in django, but I can say anecdotally that my colleague did not have good luck with Celery

==
Brian Hamlin
planetwork.net
OSGeo California Chapter
(415) 717-4462 cell

On Fri, 13 May 2011 17:46:50 +0100, Daniele Varrazzo  wrote:

On Fri, May 13, 2011 at 5:12 PM, Otto Vazquez <otto.vazquez@gmail.com> wrote:

> OperationalError: no connection to the server

> We believe it's a connector problem, just google a little and you will find
> lots a posts with same/similar problem. > Some other useful info/samples:
> http://stackoverflow.com/questions/1303654/threaded-django-task-doesnt-automatically-handle-transactions-or-db-connections
> http://groups.google.com/group/django-developers/browse_frm/thread/5249b9ba993431ca/4d1b9d65329c8b75
> http://code.djangoproject.com/ticket/9964

Actually it doesn't seem the same issue to me. IIRC with the django
issue you get long running transactions. A possible consequence may be
getting errors like "current transaction is aborted...". But "no
connection to the server" is an error message I have never seen. I
wouldn't even know how to reproduce it just using psycopg: if you
issue a rollback() on a closed connection you don't get that error,
but rather a clean "InterfaceError: connection already closed".
Do you have any middleware software (pgpool etc.) handling the
connection used by psycopg? Anything interfering with the socket?

-- Daniele

-- Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg





Re: connection lost with concurrent transactions

От
Daniele Varrazzo
Дата:
On Mon, May 16, 2011 at 2:39 PM, Otto Vazquez <otto.vazquez@gmail.com> wrote:
> I just installed RabbitMQ in my local machine (ubuntu 10.10, postgres 8.4.8
> and rabbitmq 1.8.0, other stuff is the same version) and everything worked
> fine.
> So I tried to do the same with the dev environment (centos 5.0 final,
> rabbitmq 1.7.2 from epel repo). We have 6 machines: 2 cds (where tasks are
> executed), 2 cms and 2 db (master/slave, so only master accessible).
> I have tried different RabbitMQ configuration: just in db master host, in
> both cds hosts, in all hosts... no way. Always getting same error:
> [2011-05-16 14:20:12,084: WARNING/PoolWorker-1]
> /usr/lib/python2.6/site-packages/celery-2.2.4-py2.6.egg/celery/worker/job.py:114:
> UserWarning: Exception outside body: <class 'psycopg2.InterfaceError'>:
> connection already closed

This is different from the first error you reported, isn't it? You
were initially reporting a "no connection to server".

The last error seems more manageable: it seems connection.close() is
invoked twice. Can you trace who calls the close method before django?
You may use a connection subclass to discover this:

    class TraceConn(psycopg2.extensions.connection):
        def close(self):
            # replace with the logging strategy you need
            import traceback
            traceback.print_stack()
            psycopg2.extensions.connection.close(self)

and create your connection with

    conn = psycopg2.connect(DSN, connection_factory=TraceConn)

I don't know if django supports any way to pass the connection_factory
method. If it doesn't you may inject it via monkeypatching by running
very early (e.g. in the settings.py):

    connect_orig = psycopg2.connect

    def my_connect(dsn, **kwargs):
        kwargs['connection_factory'] = TraceConn
        return connect_orig(dsn, **kwargs)

    psycopg2.connect = my_connect


> So now, I'm not sure if this is a matter of architecture, version bug or the
> connector is not working properly.

I don't know what celery does.


> BTW, we are not using any db middleware (pgpool or pgbouncer)

Good to know.


> Any hint before moving to MySQL?

Can you check if MySQLdb's connections support close() twice? Maybe
the difference is there (different implementations of the DBAPI
requirement "The connection will be unusable from [close()] forward".


-- Daniele