Обсуждение: Multi-threaded user app segfaults when using libpq with separate connections
Multi-threaded user app segfaults when using libpq with separate connections
От
Frank van Vugt
Дата:
Your name : Frank van Vugt Your email address : ftm.van.vugt@foxi.nl System Configuration --------------------- Architecture :Intel Pentium III / Amd Athlon XP+ Operating System :Linux 2.4.20 with glibc 2.3.2 (SuSE v8.2) PostgreSQL version :PostgreSQL-7.3.4 Compiler used :gcc v3.3 Please enter a FULL description of your problem: ------------------------------------------------ A small multi-threaded demo/poc-program using Qt (including its SQL-module which connects to PostgreSQL using libpq) is segfaulting while multiple connections are used. The code creates 10 new threads, each of which basically: - locks a mutex - adds/opens a new database connection - unlocks the mutex - using the newly created/opened connection - starts a transaction - sets transaction isolation level to SERIALIZABLE - performssome select-only queries - commits the transaction - closes the database connection Out of 10 consecutive runs, the code completes three times successfully and segfaults the other seven times. I have recompiled libpq with debugging and without optimization, the backtraces of the coredumps are included below. Important : if the unlocking of the mutex is moved to AFTER the database close, then the code seems to run without any problem each and every time Please describe a way to repeat the problem. Please try to provide a concise reproducible example, if at all possible: ---------------------------------------------------------------------- I'm not sure how to proceed from here. Obviously I'm using Qt for handling PostgreSQL connections as well as multi-threading, so rewriting my 'demo-code' in plain C++ using libpthread and libpq would probably take too much time. Maybe someone is willing to have a go at the Qt-code (just let me know and I'll mail the source), or maybe (hopefully) the information provided here will prove sufficient a clue to pinpoint the problem. The only thing I found in the mailinglist archives which *may* have some bearing to this problem is: http://archives.postgresql.org/pgsql-interfaces/2002-09/msg00042.php I'm not sure it's relevant, but please be advised that the PostgreSQL driver in Qt has the following code included in order to allow compilation: // PostgreSQL header <utils/elog.h> included by <postgres.h> redefines DEBUG. #if defined(DEBUG) # undef DEBUG #endif #include <postgres.h> #include <libpq/libpq-fs.h> // PostgreSQL header <catalog/pg_type.h> redefines errno erroneously. #if defined(errno) # undef errno #endif #define errno qt_psql_errno #include <catalog/pg_type.h> #undef errno Furthermore, I found that the 7.4devdocs talk about the --enable-thread-safety option for ./configure, but I don't think that applies to my platform? => backtraces of coredumps included below Best, Frank. ======================================================== The results of ten sequentially executed runs: -+-+-+- #0 0x40cc1b2b in realloc () from /lib/libc.so.6 (gdb) where #0 0x40cc1b2b in realloc () from /lib/libc.so.6 #1 0x40d8c7ac in pqReadData (conn=0x80c0fc0) at fe-misc.c:483 #2 0x40d8a335 in PQgetResult (conn=0x80c0fc0) at fe-exec.c:1337 #3 0x40d8a511 in PQexec (conn=0x80c0fc0, query=0x80cb638 "set transaction isolation level SERIALIZABLE") at fe-exec.c:1441 #4 0x4063aaa7 in QPSQLResult::reset(QString const&) (this=0x80cb418, query=@0xbf7ffa9c) at sql/drivers/psql/qsql_psql.cpp:473 #5 0x4060dd34 in QSqlQuery::exec(QString const&) (this=0xbf7ffa8c, query=@0xbf7ffa9c) at sql/qsqlquery.cpp:358 #6 0x08050401 in PrintEngine::run() (this=0x80a72e0) at printengine.cpp:85 #7 0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7374) at kernel/qthread_unix.cpp:120 #8 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 (gdb) -+-+-+- #0 0x4061390d in QSqlDatabase::transaction() (this=0x80cd2e8) at sql/qsqldatabase.cpp:760 760 if ( !d->driver->hasFeature( QSqlDriver::Transactions ) ) (gdb) where #0 0x4061390d in QSqlDatabase::transaction() (this=0x80cd2e8) at sql/qsqldatabase.cpp:760 #1 0x080503cb in PrintEngine::run() (this=0x80c0b98) at printengine.cpp:83 #2 0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80c0bdc) at kernel/qthread_unix.cpp:120 #3 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 (gdb) -+-+-+- Third run without problems. -+-+-+- Fourth run without problems. -+-+-+- #0 0x40caf675 in fflush () from /lib/libc.so.6 (gdb) where #0 0x40caf675 in fflush () from /lib/libc.so.6 #1 0x40d8cc16 in pqSendSome (conn=0x80c10f8) at fe-misc.c:746 #2 0x40d8cc46 in pqFlush (conn=0x80c10f8) at fe-misc.c:764 #3 0x40d896e6 in PQsendQuery (conn=0x80c10f8, query=0x80cb3d0 "set transaction isolation level SERIALIZABLE") at fe-exec.c:799 #4 0x40d8a4f3 in PQexec (conn=0x80c10f8, query=0x80cb3d0 "set transaction isolation level SERIALIZABLE") at fe-exec.c:1429 #5 0x4063aaa7 in QPSQLResult::reset(QString const&) (this=0x80cb0f8, query=@0xbf7ffa9c) at sql/drivers/psql/qsql_psql.cpp:473 #6 0x4060dd34 in QSqlQuery::exec(QString const&) (this=0xbf7ffa8c, query=@0xbf7ffa9c) at sql/qsqlquery.cpp:358 #7 0x08050401 in PrintEngine::run() (this=0x80a72d0) at printengine.cpp:85 #8 0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7364) at kernel/qthread_unix.cpp:120 #9 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 (gdb) -+-+-+- #0 0x4061391e in QSqlDatabase::transaction() (this=0x80cbcb0) at sql/qsqldatabase.cpp:760 760 if ( !d->driver->hasFeature( QSqlDriver::Transactions ) ) (gdb) where #0 0x4061391e in QSqlDatabase::transaction() (this=0x80cbcb0) at sql/qsqldatabase.cpp:760 #1 0x080503cb in PrintEngine::run() (this=0x80bd358) at printengine.cpp:83 #2 0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80c0a74) at kernel/qthread_unix.cpp:120 #3 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 (gdb) -+-+-+- #0 0x403352a1 in QObject::connect(QObject const*, char const*, QObject const*, char const*) (sender=0x80c0bf8, signal=0x80c0b58 "2destroyed()", receiver=0x80c7660, member=0x80c0c68 "1slotResultDestroyed()") at kernel/qobject.cpp:1737 1737 QMetaObject *smeta = sender->metaObject(); (gdb) where #0 0x403352a1 in QObject::connect(QObject const*, char const*, QObject const*, char const*) (sender=0x80c0bf8, signal=0x80c0b58 "2destroyed()", receiver=0x80c7660, member=0x80c0c68 "1slotResultDestroyed()") at kernel/qobject.cpp:1737 #1 0x4060d07d in QSqlResultShared (this=0x80c7660, result=0x80cb080) at sql/qsqlquery.cpp:57 #2 0x4060d42f in QSqlQuery (this=0xbf7ff94c, r=0x80cb080) at sql/qsqlquery.cpp:213 #3 0x4063c03b in QPSQLDriver::createQuery() const (this=0x80c0bf8) at sql/drivers/psql/qsql_psql.cpp:687 #4 0x4060d874 in QSqlQuery::init(QString const&, QSqlDatabase*) (this=0xbf7ffa8c, query=@0x40a26a88, db=0x80c0ae8) at sql/qsqlquery.cpp:272 #5 0x4060d762 in QSqlQuery (this=0xbf7ffa8c, db=0x80c0ae8) at sql/qsqlquery.cpp:259 #6 0x080503dd in PrintEngine::run() (this=0x80a72e0) at printengine.cpp:84 #7 0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7374) at kernel/qthread_unix.cpp:120 #8 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 (gdb) -+-+-+- #0 0x4061396d in QSqlDatabase::commit() (this=0x80c0ae8) at sql/qsqldatabase.cpp:775 775 if ( !d->driver->hasFeature( QSqlDriver::Transactions ) ) (gdb) where #0 0x4061396d in QSqlDatabase::commit() (this=0x80c0ae8) at sql/qsqldatabase.cpp:775 #1 0x08050b9d in PrintEngine::run() (this=0x80a72e0) at printengine.cpp:118 #2 0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7374) at kernel/qthread_unix.cpp:120 #3 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 (gdb) -+-+-+- Nineth run without problems -+-+-+- #0 0x40cc6a01 in strlen () from /lib/libc.so.6 (gdb) where #0 0x40cc6a01 in strlen () from /lib/libc.so.6 #1 0x4071bf33 in internalLatin1ToUnicode (str=0xbf7ff89c "\b\021\f\b .¡@Äø\177¿êÀn@ .¡@üø\177¿Dù\177¿9¬c@üø\177¿,ù\177¿\002", len=0x83ffbbd8, maxlen=3212834996) at tools/qstring.cpp:1164 #2 0x4071cb0b in QString (this=0xbf7ff89c, str=0x83ffbbd8 <Address 0x83ffbbd8 out of bounds>) at tools/qstring.cpp:1473 #3 0x40638d4f in qMakeError (err=@0xbf7ff92c, type=2, p=0x80ca2d0) at sql/drivers/psql/qsql_psql.cpp:118 #4 0x4063ac39 in QPSQLResult::reset(QString const&) (this=0x80ca5d8, query=@0xbf7ffa9c) at sql/drivers/psql/qsql_psql.cpp:489 #5 0x4060dd34 in QSqlQuery::exec(QString const&) (this=0xbf7ffa8c, query=@0xbf7ffa9c) at sql/qsqlquery.cpp:358 #6 0x08050401 in PrintEngine::run() (this=0x80a72e0) at printengine.cpp:85 #7 0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7374) at kernel/qthread_unix.cpp:120 #8 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 (gdb) -+-+-+-
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > // PostgreSQL header <catalog/pg_type.h> redefines errno erroneously. > #if defined(errno) > # undef errno > #endif > #define errno qt_psql_errno > #include <catalog/pg_type.h> > #undef errno It seems highly unlikely that the above is a good idea. errno is a macro on all threaded platforms I'm aware of, but the above coding forgets the definition. The behavior complained of in the comment is gone anyway in PG 7.3. Not sure that this explains your problem though. There are several places in libpq that invoke libc routines that are not thread-safe (on most platforms anyway). We have attempted to rectify this for PG 7.4. You may care to try your test with CVS tip and see if it works better. regards, tom lane
Re: Multi-threaded user app segfaults when using libpq with separate connections
От
Frank van Vugt
Дата:
Hi Tom, > It seems highly unlikely that the above is a good idea. <cut> > The behavior complained of in the comment > is gone anyway in PG 7.3. I've conveyed the message, thanks. > You may care to try your test with CVS tip and see if it works better. I've retried this with a cvs-version of this afternoon (local time). The build stranded in plpgsql (I seem to need a more recent bison), but at that point libpq was already finished. This still resulted in more or less the same kind of errors. I then retried with a cvs-version of 21:30 (utc), configured with enable-thread-safety and compiled with debug info / without optimization, along with a fresh compilation of Qt. Results : same diff (see errors below). What more can I do to help out / find out what the problem is ? Best, Frank van Vugt =+=+=+=+=+=+=+=+=+=+=+=+=+= #0 0x4063ab77 in QPSQLResult::reset(QString const&) (this=0x80c10f0, query=@0xbf7ffa9c) at sql/drivers/psql/qsql_psql.cpp:477 477 int status = PQresultStatus( d->result ); (gdb) where #0 0x4063ab77 in QPSQLResult::reset(QString const&) (this=0x80c10f0, query=@0xbf7ffa9c) at sql/drivers/psql/qsql_psql.cpp:477 #1 0x4060dd44 in QSqlQuery::exec(QString const&) (this=0xbf7ffa8c, query=@0xbf7ffa9c) at sql/qsqlquery.cpp:358 #2 0x08050823 in PrintEngine::run() (this=0x809e2b8) at printengine.cpp:97 #3 0x402b4bf6 in QThreadInstance::start(void*) (_arg=0x809e31c) at kernel/qthread_unix.cpp:120 #4 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 (gdb) #0 0x40d8c85a in pqGetc (result=0xbf3ff81b "@8»Ù@ü\031\216@Dø?¿\035¬Ø@\210", conn=0x80d0088) at fe-misc.c:85 85 * write 1 char to the connection (gdb) where #0 0x40d8c85a in pqGetc (result=0xbf3ff81b "@8»Ù@ü\031\216@Dø?¿\035¬Ø@\210", conn=0x80d0088) at fe-misc.c:85 #1 0x40d90afe in pqParseInput2 (conn=0x80d0088) at fe-protocol2.c:384 #2 0x40d8ac1d in parseInput (conn=0x80d0088) at fe-exec.c:918 #3 0x40d8ac8f in PQgetResult (conn=0x80d0088) at fe-exec.c:955 #4 0x40d8b0ac in PQexecFinish (conn=0x80d0088) at fe-exec.c:1159 #5 0x40d8aec6 in PQexec (conn=0x80d0088, query=0x80d4270 "set transaction isolation level SERIALIZABLE") at fe-exec.c:1047 #6 0x4063aab7 in QPSQLResult::reset(QString const&) (this=0x80d3d20, query=@0xbf3ffa9c) at sql/drivers/psql/qsql_psql.cpp:473 #7 0x4060dd44 in QSqlQuery::exec(QString const&) (this=0xbf3ffa8c, query=@0xbf3ffa9c) at sql/qsqlquery.cpp:358 #8 0x08050411 in PrintEngine::run() (this=0x80a1860) at printengine.cpp:85 #9 0x402b4bf6 in QThreadInstance::start(void*) (_arg=0x80bd4c4) at kernel/qthread_unix.cpp:120 #10 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 Current language: auto; currently c (gdb) #0 0x4061391d in QSqlDatabase::transaction() (this=0x80cc078) at sql/qsqldatabase.cpp:760 760 if ( !d->driver->hasFeature( QSqlDriver::Transactions ) ) (gdb) where #0 0x4061391d in QSqlDatabase::transaction() (this=0x80cc078) at sql/qsqldatabase.cpp:760 #1 0x080503db in PrintEngine::run() (this=0x80a70b8) at printengine.cpp:83 #2 0x402b4bf6 in QThreadInstance::start(void*) (_arg=0x80c0c3c) at kernel/qthread_unix.cpp:120 #3 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 (gdb) #0 0x40d89ea0 in PQclear (res=0x417974dd) at fe-exec.c:349 349 (gdb) where #0 0x40d89ea0 in PQclear (res=0x417974dd) at fe-exec.c:349 #1 0x40d89f12 in pqClearAsyncResult (conn=0x80cba10) at fe-exec.c:369 #2 0x40d89f7a in pqSaveErrorResult (conn=0x80cba10) at fe-exec.c:397 #3 0x40d8ad11 in PQgetResult (conn=0x80cba10) at fe-exec.c:984 #4 0x40d8b0ac in PQexecFinish (conn=0x80cba10) at fe-exec.c:1159 #5 0x40d8aec6 in PQexec (conn=0x80cba10, query=0x80d8178 "set transaction isolation level SERIALIZABLE") at fe-exec.c:1047 #6 0x4063aab7 in QPSQLResult::reset(QString const&) (this=0x80d7f98, query=@0xbf3ffa9c) at sql/drivers/psql/qsql_psql.cpp:473 #7 0x4060dd44 in QSqlQuery::exec(QString const&) (this=0xbf3ffa8c, query=@0xbf3ffa9c) at sql/qsqlquery.cpp:358 #8 0x08050411 in PrintEngine::run() (this=0x80be3b0) at printengine.cpp:85 #9 0x402b4bf6 in QThreadInstance::start(void*) (_arg=0x80c0c44) at kernel/qthread_unix.cpp:120 #10 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0 Current language: auto; currently c (gdb)
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: >> You may care to try your test with CVS tip and see if it works better. > I've retried this with a cvs-version of this afternoon (local > time). The build stranded in plpgsql (I seem to need a more recent > bison), but at that point libpq was already finished. This still > resulted in more or less the same kind of errors. Are you sure that each thread is accessing only its own PGconn object? Other than that, not sure what to suggest. I get the impression that something is clobbering memory with abandon. Do you have any Purify-like tools to look for memory stomp problems? regards, tom lane
Re: Multi-threaded user app segfaults when using libpq with separate connections
От
Frank van Vugt
Дата:
> I get the impression that > something is clobbering memory with abandon. Do you have any > Purify-like tools to look for memory stomp problems? In addition to my post of this morning containing the regular Valgrind output, I'm including here the relevant parts of a valgrind run using the 'helgrind' skin, which detects possible data race conditions. I hope the info is of some use, best, Frank van Vugt =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+= ==14670== ==14670== Thread 2: ==14670== Possible data race writing variable at 0x43C561A8 ==14670== at 0x40EB48D7: enlargePQExpBuffer (pqexpbuffer.c:152) ==14670== by 0x40EB4994: printfPQExpBuffer (pqexpbuffer.c:203) ==14670== by 0x40EB168B: pqReadData (fe-misc.c:449) ==14670== by 0x40EAF334: PQgetResult (fe-exec.c:1337) ==14670== Address 0x43C561A8 is 456 bytes inside a block of size 16384 alloc'd by thread 3 ==14670== at 0x4001CA2C: malloc (vg_replace_malloc.c:153) ==14670== by 0x40EABF44: makeEmptyPGconn (fe-connect.c:1881) ==14670== by 0x40EAA319: PQconnectStart (fe-connect.c:273) ==14670== by 0x40EAA2D5: PQconnectdb (fe-connect.c:235) ==14670== Previous state: shared RO, no locks ==14670== ==14670== Thread 2: ==14670== Possible data race writing variable at 0x43C561B0 ==14670== at 0x40EB48DF: enlargePQExpBuffer (pqexpbuffer.c:153) ==14670== by 0x40EB4994: printfPQExpBuffer (pqexpbuffer.c:203) ==14670== by 0x40EB168B: pqReadData (fe-misc.c:449) ==14670== by 0x40EAF334: PQgetResult (fe-exec.c:1337) ==14670== Address 0x43C561B0 is 464 bytes inside a block of size 16384 alloc'd by thread 3 ==14670== at 0x4001CA2C: malloc (vg_replace_malloc.c:153) ==14670== by 0x40EABF44: makeEmptyPGconn (fe-connect.c:1881) ==14670== by 0x40EAA319: PQconnectStart (fe-connect.c:273) ==14670== by 0x40EAA2D5: PQconnectdb (fe-connect.c:235) ==14670== Previous state: shared RO, no locks ==14670== ==14670== Thread 2: ==14670== Possible data race reading variable at 0x40D45614 ==14670== at 0x4000700D: _dl_lookup_versioned_symbol_internal (in /lib/ld-2.3.2.so) ==14670== by 0x4000AE8E: fixup (in /lib/ld-2.3.2.so) ==14670== by 0x4000B05F: _dl_runtime_resolve (in /lib/ld-2.3.2.so) ==14670== by 0x40EB168B: pqReadData (fe-misc.c:449) ==14670== Address 0x40D45614 is not stack'd, malloc'd or free'd ==14670== Previous state: shared RW, locked by:0x432F7440 ==14670== ==14670== Thread 2: ==14670== Possible data race writing variable at 0x43C561AC ==14670== at 0x40EB497F: printfPQExpBuffer (pqexpbuffer.c:198) ==14670== by 0x40EB168B: pqReadData (fe-misc.c:449) ==14670== by 0x40EAF334: PQgetResult (fe-exec.c:1337) ==14670== by 0x40EAF510: PQexec (fe-exec.c:1441) ==14670== Address 0x43C561AC is 460 bytes inside a block of size 16384 alloc'd by thread 3 ==14670== at 0x4001CA2C: malloc (vg_replace_malloc.c:153) ==14670== by 0x40EABF44: makeEmptyPGconn (fe-connect.c:1881) ==14670== by 0x40EAA319: PQconnectStart (fe-connect.c:273) ==14670== by 0x40EAA2D5: PQconnectdb (fe-connect.c:235) ==14670== Previous state: shared RO, no locks ==14670== ==14670== Thread 2: ==14670== Possible data race writing variable at 0x43C56198 ==14670== at 0x40EAE3CC: saveErrorResult (fe-exec.c:632) ==14670== by 0x40EAF348: PQgetResult (fe-exec.c:1344) ==14670== by 0x40EAF510: PQexec (fe-exec.c:1441) ==14670== by 0x407870D2: QPSQLDriver::beginTransaction() (sql/drivers/psql/qsql_psql.cpp:698) ==14670== Address 0x43C56198 is 440 bytes inside a block of size 16384 alloc'd by thread 3 ==14670== at 0x4001CA2C: malloc (vg_replace_malloc.c:153) ==14670== by 0x40EABF44: makeEmptyPGconn (fe-connect.c:1881) ==14670== by 0x40EAA319: PQconnectStart (fe-connect.c:273) ==14670== by 0x40EAA2D5: PQconnectdb (fe-connect.c:235) ==14670== Previous state: shared RO, no locks accessed distinguished 2ndary map! 0x350034 read distinguished 2ndary map! 0x350034 Segmentatie fout (core dumped)
SOLVED: Multi-threaded Qt app segfaults when using libpq with separate connections
От
Frank van Vugt
Дата:
Hi, I *finally* nailed this thing, turns out NOT to be a PG problem, more like PEBKC. Tom Lane wrote: > I get the impression that something is clobbering memory with abandon. He was write (duh..). What wasn't clear (at least to me) from the Qt-docs is that a name given to a supplementary database connection must be unique, otherwise the new one will replace the old one. Expecting to be able to handle the connections by the pointers an addDatabase() call returned, I had been creating new connections in separate threads whose names weren't always unique. Since 'connection' here implicates a PGconn structure, I had been 'switching' PGconn structures while in use. It seems one can even get away with that for a limited number of threads, but at a certain point newly created result structures (PGresult, PQntubles, PQnfields, etc.) will be allocated space on top of the older structures belonging the replaced connections, which is obviously the point where things get very messy.... I've asked the Trolls (Qt) to emphasize this potential problem in their docs. Thanks to Tom Lane (PostgreSQL) and Harald Fernengel (TrollTech) for their pointers ;) Best, Frank van Vugt