Обсуждение: Multi-threaded user app segfaults when using libpq with separate connections

Поиск
Список
Период
Сортировка

Multi-threaded user app segfaults when using libpq with separate connections

От
Frank van Vugt
Дата:
Your name        :        Frank van Vugt
Your email address    :    ftm.van.vugt@foxi.nl

System Configuration
--------------------- Architecture  :Intel Pentium III / Amd Athlon XP+
 Operating System :Linux 2.4.20 with glibc 2.3.2 (SuSE v8.2)
 PostgreSQL version :PostgreSQL-7.3.4
 Compiler used :gcc v3.3


Please enter a FULL description of your problem:
------------------------------------------------

A small multi-threaded demo/poc-program using Qt (including its SQL-module
which connects to PostgreSQL using libpq) is segfaulting while multiple
connections are used.

The code creates 10 new threads, each of which basically:
- locks a mutex
- adds/opens a new database connection
- unlocks the mutex
- using the newly created/opened connection - starts a transaction - sets transaction isolation level to SERIALIZABLE -
performssome select-only queries - commits the transaction 
- closes the database connection


Out of 10 consecutive runs, the code completes three times successfully and
segfaults the other seven times. I have recompiled libpq with debugging and
without optimization, the backtraces of the coredumps are included below.


Important : if the unlocking of the mutex is moved to AFTER the database
close, then the code seems to run without any problem each and every time



Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------

I'm not sure how to proceed from here. Obviously I'm using Qt for handling
PostgreSQL connections as well as multi-threading, so rewriting my
'demo-code' in plain C++ using libpthread and libpq would probably take too
much time. Maybe someone is willing to have a go at the Qt-code (just let me
know and I'll mail the source), or maybe (hopefully) the information provided
here will prove sufficient a clue to pinpoint the problem.

The only thing I found in the mailinglist archives which *may* have some
bearing to this problem is:
http://archives.postgresql.org/pgsql-interfaces/2002-09/msg00042.php

I'm not sure it's relevant, but please be advised that the PostgreSQL driver
in Qt has the following code included in order to allow compilation:

// PostgreSQL header <utils/elog.h> included by <postgres.h> redefines DEBUG.
#if defined(DEBUG)
# undef DEBUG
#endif
#include <postgres.h>
#include <libpq/libpq-fs.h>
// PostgreSQL header <catalog/pg_type.h> redefines errno erroneously.
#if defined(errno)
# undef errno
#endif
#define errno qt_psql_errno
#include <catalog/pg_type.h>
#undef errno


Furthermore, I found that the 7.4devdocs talk about the --enable-thread-safety
option for ./configure, but I don't think that applies to my platform?




=> backtraces of coredumps included below



Best,







Frank.


========================================================

The results of ten sequentially executed runs:

-+-+-+-

#0  0x40cc1b2b in realloc () from /lib/libc.so.6
(gdb) where
#0  0x40cc1b2b in realloc () from /lib/libc.so.6
#1  0x40d8c7ac in pqReadData (conn=0x80c0fc0) at fe-misc.c:483
#2  0x40d8a335 in PQgetResult (conn=0x80c0fc0) at fe-exec.c:1337
#3  0x40d8a511 in PQexec (conn=0x80c0fc0, query=0x80cb638 "set transaction
isolation level SERIALIZABLE") at fe-exec.c:1441
#4  0x4063aaa7 in QPSQLResult::reset(QString const&) (this=0x80cb418,
query=@0xbf7ffa9c) at sql/drivers/psql/qsql_psql.cpp:473
#5  0x4060dd34 in QSqlQuery::exec(QString const&) (this=0xbf7ffa8c,
query=@0xbf7ffa9c) at sql/qsqlquery.cpp:358
#6  0x08050401 in PrintEngine::run() (this=0x80a72e0) at printengine.cpp:85
#7  0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7374) at
kernel/qthread_unix.cpp:120
#8  0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
(gdb)

-+-+-+-

#0  0x4061390d in QSqlDatabase::transaction() (this=0x80cd2e8) at
sql/qsqldatabase.cpp:760
760         if ( !d->driver->hasFeature( QSqlDriver::Transactions ) )
(gdb) where
#0  0x4061390d in QSqlDatabase::transaction() (this=0x80cd2e8) at
sql/qsqldatabase.cpp:760
#1  0x080503cb in PrintEngine::run() (this=0x80c0b98) at printengine.cpp:83
#2  0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80c0bdc) at
kernel/qthread_unix.cpp:120
#3  0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
(gdb)

-+-+-+-

Third run without problems.

-+-+-+-

Fourth run without problems.

-+-+-+-

#0  0x40caf675 in fflush () from /lib/libc.so.6
(gdb) where
#0  0x40caf675 in fflush () from /lib/libc.so.6
#1  0x40d8cc16 in pqSendSome (conn=0x80c10f8) at fe-misc.c:746
#2  0x40d8cc46 in pqFlush (conn=0x80c10f8) at fe-misc.c:764
#3  0x40d896e6 in PQsendQuery (conn=0x80c10f8, query=0x80cb3d0 "set
transaction isolation level SERIALIZABLE") at fe-exec.c:799
#4  0x40d8a4f3 in PQexec (conn=0x80c10f8, query=0x80cb3d0 "set transaction
isolation level SERIALIZABLE") at fe-exec.c:1429
#5  0x4063aaa7 in QPSQLResult::reset(QString const&) (this=0x80cb0f8,
query=@0xbf7ffa9c) at sql/drivers/psql/qsql_psql.cpp:473
#6  0x4060dd34 in QSqlQuery::exec(QString const&) (this=0xbf7ffa8c,
query=@0xbf7ffa9c) at sql/qsqlquery.cpp:358
#7  0x08050401 in PrintEngine::run() (this=0x80a72d0) at printengine.cpp:85
#8  0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7364) at
kernel/qthread_unix.cpp:120
#9  0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
(gdb)

-+-+-+-

#0  0x4061391e in QSqlDatabase::transaction() (this=0x80cbcb0) at
sql/qsqldatabase.cpp:760
760         if ( !d->driver->hasFeature( QSqlDriver::Transactions ) )
(gdb) where
#0  0x4061391e in QSqlDatabase::transaction() (this=0x80cbcb0) at
sql/qsqldatabase.cpp:760
#1  0x080503cb in PrintEngine::run() (this=0x80bd358) at printengine.cpp:83
#2  0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80c0a74) at
kernel/qthread_unix.cpp:120
#3  0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
(gdb)

-+-+-+-

#0  0x403352a1 in QObject::connect(QObject const*, char const*, QObject
const*, char const*) (sender=0x80c0bf8, signal=0x80c0b58 "2destroyed()",
receiver=0x80c7660, member=0x80c0c68 "1slotResultDestroyed()")   at kernel/qobject.cpp:1737
1737        QMetaObject *smeta = sender->metaObject();
(gdb) where
#0  0x403352a1 in QObject::connect(QObject const*, char const*, QObject
const*, char const*) (sender=0x80c0bf8, signal=0x80c0b58 "2destroyed()",
receiver=0x80c7660, member=0x80c0c68 "1slotResultDestroyed()")   at kernel/qobject.cpp:1737
#1  0x4060d07d in QSqlResultShared (this=0x80c7660, result=0x80cb080) at
sql/qsqlquery.cpp:57
#2  0x4060d42f in QSqlQuery (this=0xbf7ff94c, r=0x80cb080) at
sql/qsqlquery.cpp:213
#3  0x4063c03b in QPSQLDriver::createQuery() const (this=0x80c0bf8) at
sql/drivers/psql/qsql_psql.cpp:687
#4  0x4060d874 in QSqlQuery::init(QString const&, QSqlDatabase*)
(this=0xbf7ffa8c, query=@0x40a26a88, db=0x80c0ae8) at sql/qsqlquery.cpp:272
#5  0x4060d762 in QSqlQuery (this=0xbf7ffa8c, db=0x80c0ae8) at
sql/qsqlquery.cpp:259
#6  0x080503dd in PrintEngine::run() (this=0x80a72e0) at printengine.cpp:84
#7  0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7374) at
kernel/qthread_unix.cpp:120
#8  0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
(gdb)

-+-+-+-

#0  0x4061396d in QSqlDatabase::commit() (this=0x80c0ae8) at
sql/qsqldatabase.cpp:775
775         if ( !d->driver->hasFeature( QSqlDriver::Transactions ) )
(gdb) where
#0  0x4061396d in QSqlDatabase::commit() (this=0x80c0ae8) at
sql/qsqldatabase.cpp:775
#1  0x08050b9d in PrintEngine::run() (this=0x80a72e0) at printengine.cpp:118
#2  0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7374) at
kernel/qthread_unix.cpp:120
#3  0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
(gdb)

-+-+-+-

Nineth run without problems

-+-+-+-

#0  0x40cc6a01 in strlen () from /lib/libc.so.6
(gdb) where
#0  0x40cc6a01 in strlen () from /lib/libc.so.6
#1  0x4071bf33 in internalLatin1ToUnicode (str=0xbf7ff89c "\b\021\f\b
.¡@Äø\177¿êÀn@ .¡@üø\177¿Dù\177¿9¬c@üø\177¿,ù\177¿\002", len=0x83ffbbd8,
maxlen=3212834996) at tools/qstring.cpp:1164
#2  0x4071cb0b in QString (this=0xbf7ff89c, str=0x83ffbbd8 <Address 0x83ffbbd8
out of bounds>) at tools/qstring.cpp:1473
#3  0x40638d4f in qMakeError (err=@0xbf7ff92c, type=2, p=0x80ca2d0) at
sql/drivers/psql/qsql_psql.cpp:118
#4  0x4063ac39 in QPSQLResult::reset(QString const&) (this=0x80ca5d8,
query=@0xbf7ffa9c) at sql/drivers/psql/qsql_psql.cpp:489
#5  0x4060dd34 in QSqlQuery::exec(QString const&) (this=0xbf7ffa8c,
query=@0xbf7ffa9c) at sql/qsqlquery.cpp:358
#6  0x08050401 in PrintEngine::run() (this=0x80a72e0) at printengine.cpp:85
#7  0x402b4be6 in QThreadInstance::start(void*) (_arg=0x80a7374) at
kernel/qthread_unix.cpp:120
#8  0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
(gdb)

-+-+-+-




Re: Multi-threaded user app segfaults when using libpq with separate connections

От
Tom Lane
Дата:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
> // PostgreSQL header <catalog/pg_type.h> redefines errno erroneously.
> #if defined(errno)
> # undef errno
> #endif
> #define errno qt_psql_errno
> #include <catalog/pg_type.h>
> #undef errno

It seems highly unlikely that the above is a good idea.  errno is a
macro on all threaded platforms I'm aware of, but the above coding
forgets the definition.  The behavior complained of in the comment
is gone anyway in PG 7.3.

Not sure that this explains your problem though.

There are several places in libpq that invoke libc routines that are
not thread-safe (on most platforms anyway).  We have attempted to
rectify this for PG 7.4.  You may care to try your test with CVS tip and
see if it works better.
        regards, tom lane


Re: Multi-threaded user app segfaults when using libpq with separate connections

От
Frank van Vugt
Дата:
Hi Tom,

> It seems highly unlikely that the above is a good idea.  <cut>
> The behavior complained of in the comment
> is gone anyway in PG 7.3.

I've conveyed the message, thanks.

> You may care to try your test with CVS tip and see if it works better.

I've retried this with a cvs-version of this afternoon (local time). The build
stranded in plpgsql (I seem to need a more recent bison), but at that point
libpq was already finished. This still resulted in more or less the same kind
of errors. I then retried with a cvs-version of 21:30 (utc), configured with
enable-thread-safety and compiled with debug info / without optimization,
along with a fresh compilation of Qt.

Results : same diff (see errors below).


What more can I do to help out / find out what the problem is ?




Best,








Frank van Vugt

=+=+=+=+=+=+=+=+=+=+=+=+=+=

#0  0x4063ab77 in QPSQLResult::reset(QString const&) (this=0x80c10f0,
query=@0xbf7ffa9c)   at sql/drivers/psql/qsql_psql.cpp:477
477         int status =  PQresultStatus( d->result );
(gdb) where
#0  0x4063ab77 in QPSQLResult::reset(QString const&) (this=0x80c10f0,
query=@0xbf7ffa9c)   at sql/drivers/psql/qsql_psql.cpp:477
#1  0x4060dd44 in QSqlQuery::exec(QString const&) (this=0xbf7ffa8c,
query=@0xbf7ffa9c) at sql/qsqlquery.cpp:358
#2  0x08050823 in PrintEngine::run() (this=0x809e2b8) at printengine.cpp:97
#3  0x402b4bf6 in QThreadInstance::start(void*) (_arg=0x809e31c) at
kernel/qthread_unix.cpp:120
#4  0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
(gdb)


#0  0x40d8c85a in pqGetc (result=0xbf3ff81b "@8»Ù@ü\031\216@Dø?¿\035¬Ø@\210",
conn=0x80d0088) at fe-misc.c:85
85       * write 1 char to the connection
(gdb) where
#0  0x40d8c85a in pqGetc (result=0xbf3ff81b "@8»Ù@ü\031\216@Dø?¿\035¬Ø@\210",
conn=0x80d0088) at fe-misc.c:85
#1  0x40d90afe in pqParseInput2 (conn=0x80d0088) at fe-protocol2.c:384
#2  0x40d8ac1d in parseInput (conn=0x80d0088) at fe-exec.c:918
#3  0x40d8ac8f in PQgetResult (conn=0x80d0088) at fe-exec.c:955
#4  0x40d8b0ac in PQexecFinish (conn=0x80d0088) at fe-exec.c:1159
#5  0x40d8aec6 in PQexec (conn=0x80d0088, query=0x80d4270 "set transaction
isolation level SERIALIZABLE") at fe-exec.c:1047
#6  0x4063aab7 in QPSQLResult::reset(QString const&) (this=0x80d3d20,
query=@0xbf3ffa9c)   at sql/drivers/psql/qsql_psql.cpp:473
#7  0x4060dd44 in QSqlQuery::exec(QString const&) (this=0xbf3ffa8c,
query=@0xbf3ffa9c) at sql/qsqlquery.cpp:358
#8  0x08050411 in PrintEngine::run() (this=0x80a1860) at printengine.cpp:85
#9  0x402b4bf6 in QThreadInstance::start(void*) (_arg=0x80bd4c4) at
kernel/qthread_unix.cpp:120
#10 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
Current language:  auto; currently c
(gdb)


#0  0x4061391d in QSqlDatabase::transaction() (this=0x80cc078) at
sql/qsqldatabase.cpp:760
760         if ( !d->driver->hasFeature( QSqlDriver::Transactions ) )
(gdb) where
#0  0x4061391d in QSqlDatabase::transaction() (this=0x80cc078) at
sql/qsqldatabase.cpp:760
#1  0x080503db in PrintEngine::run() (this=0x80a70b8) at printengine.cpp:83
#2  0x402b4bf6 in QThreadInstance::start(void*) (_arg=0x80c0c3c) at
kernel/qthread_unix.cpp:120
#3  0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
(gdb)


#0  0x40d89ea0 in PQclear (res=0x417974dd) at fe-exec.c:349
349
(gdb) where
#0  0x40d89ea0 in PQclear (res=0x417974dd) at fe-exec.c:349
#1  0x40d89f12 in pqClearAsyncResult (conn=0x80cba10) at fe-exec.c:369
#2  0x40d89f7a in pqSaveErrorResult (conn=0x80cba10) at fe-exec.c:397
#3  0x40d8ad11 in PQgetResult (conn=0x80cba10) at fe-exec.c:984
#4  0x40d8b0ac in PQexecFinish (conn=0x80cba10) at fe-exec.c:1159
#5  0x40d8aec6 in PQexec (conn=0x80cba10, query=0x80d8178 "set transaction
isolation level SERIALIZABLE") at fe-exec.c:1047
#6  0x4063aab7 in QPSQLResult::reset(QString const&) (this=0x80d7f98,
query=@0xbf3ffa9c)   at sql/drivers/psql/qsql_psql.cpp:473
#7  0x4060dd44 in QSqlQuery::exec(QString const&) (this=0xbf3ffa8c,
query=@0xbf3ffa9c) at sql/qsqlquery.cpp:358
#8  0x08050411 in PrintEngine::run() (this=0x80be3b0) at printengine.cpp:85
#9  0x402b4bf6 in QThreadInstance::start(void*) (_arg=0x80c0c44) at
kernel/qthread_unix.cpp:120
#10 0x40b1dc60 in pthread_start_thread () from /lib/libpthread.so.0
Current language:  auto; currently c
(gdb)



Re: Multi-threaded user app segfaults when using libpq with separate connections

От
Tom Lane
Дата:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
>> You may care to try your test with CVS tip and see if it works better.

> I've retried this with a cvs-version of this afternoon (local
> time). The build stranded in plpgsql (I seem to need a more recent
> bison), but at that point libpq was already finished. This still
> resulted in more or less the same kind of errors.

Are you sure that each thread is accessing only its own PGconn object?

Other than that, not sure what to suggest.  I get the impression that
something is clobbering memory with abandon.  Do you have any
Purify-like tools to look for memory stomp problems?
        regards, tom lane


Re: Multi-threaded user app segfaults when using libpq with separate connections

От
Frank van Vugt
Дата:
> I get the impression that
> something is clobbering memory with abandon.  Do you have any
> Purify-like tools to look for memory stomp problems?

In addition to my post of this morning containing the regular Valgrind output, 
I'm including here the relevant parts of a valgrind run using the 'helgrind' 
skin, which detects possible data race conditions.


I hope the info is of some use, 

best,





Frank van Vugt

=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=

==14670== 
==14670== Thread 2:
==14670== Possible data race writing variable at 0x43C561A8 
==14670==    at 0x40EB48D7: enlargePQExpBuffer (pqexpbuffer.c:152)
==14670==    by 0x40EB4994: printfPQExpBuffer (pqexpbuffer.c:203)
==14670==    by 0x40EB168B: pqReadData (fe-misc.c:449)
==14670==    by 0x40EAF334: PQgetResult (fe-exec.c:1337)
==14670==   Address 0x43C561A8 is 456 bytes inside a block of size 16384 
alloc'd by thread 3
==14670==    at 0x4001CA2C: malloc (vg_replace_malloc.c:153)
==14670==    by 0x40EABF44: makeEmptyPGconn (fe-connect.c:1881)
==14670==    by 0x40EAA319: PQconnectStart (fe-connect.c:273)
==14670==    by 0x40EAA2D5: PQconnectdb (fe-connect.c:235)
==14670==   Previous state: shared RO, no locks
==14670== 
==14670== Thread 2:
==14670== Possible data race writing variable at 0x43C561B0 
==14670==    at 0x40EB48DF: enlargePQExpBuffer (pqexpbuffer.c:153)
==14670==    by 0x40EB4994: printfPQExpBuffer (pqexpbuffer.c:203)
==14670==    by 0x40EB168B: pqReadData (fe-misc.c:449)
==14670==    by 0x40EAF334: PQgetResult (fe-exec.c:1337)
==14670==   Address 0x43C561B0 is 464 bytes inside a block of size 16384 
alloc'd by thread 3
==14670==    at 0x4001CA2C: malloc (vg_replace_malloc.c:153)
==14670==    by 0x40EABF44: makeEmptyPGconn (fe-connect.c:1881)
==14670==    by 0x40EAA319: PQconnectStart (fe-connect.c:273)
==14670==    by 0x40EAA2D5: PQconnectdb (fe-connect.c:235)
==14670==   Previous state: shared RO, no locks
==14670== 
==14670== Thread 2:
==14670== Possible data race reading variable at 0x40D45614 
==14670==    at 0x4000700D: _dl_lookup_versioned_symbol_internal (in 
/lib/ld-2.3.2.so)
==14670==    by 0x4000AE8E: fixup (in /lib/ld-2.3.2.so)
==14670==    by 0x4000B05F: _dl_runtime_resolve (in /lib/ld-2.3.2.so)
==14670==    by 0x40EB168B: pqReadData (fe-misc.c:449)
==14670==   Address 0x40D45614 is not stack'd, malloc'd or free'd
==14670==   Previous state: shared RW, locked by:0x432F7440
==14670== 
==14670== Thread 2:
==14670== Possible data race writing variable at 0x43C561AC 
==14670==    at 0x40EB497F: printfPQExpBuffer (pqexpbuffer.c:198)
==14670==    by 0x40EB168B: pqReadData (fe-misc.c:449)
==14670==    by 0x40EAF334: PQgetResult (fe-exec.c:1337)
==14670==    by 0x40EAF510: PQexec (fe-exec.c:1441)
==14670==   Address 0x43C561AC is 460 bytes inside a block of size 16384 
alloc'd by thread 3
==14670==    at 0x4001CA2C: malloc (vg_replace_malloc.c:153)
==14670==    by 0x40EABF44: makeEmptyPGconn (fe-connect.c:1881)
==14670==    by 0x40EAA319: PQconnectStart (fe-connect.c:273)
==14670==    by 0x40EAA2D5: PQconnectdb (fe-connect.c:235)
==14670==   Previous state: shared RO, no locks
==14670== 
==14670== Thread 2:
==14670== Possible data race writing variable at 0x43C56198 
==14670==    at 0x40EAE3CC: saveErrorResult (fe-exec.c:632)
==14670==    by 0x40EAF348: PQgetResult (fe-exec.c:1344)
==14670==    by 0x40EAF510: PQexec (fe-exec.c:1441)
==14670==    by 0x407870D2: QPSQLDriver::beginTransaction() 
(sql/drivers/psql/qsql_psql.cpp:698)
==14670==   Address 0x43C56198 is 440 bytes inside a block of size 16384 
alloc'd by thread 3
==14670==    at 0x4001CA2C: malloc (vg_replace_malloc.c:153)
==14670==    by 0x40EABF44: makeEmptyPGconn (fe-connect.c:1881)
==14670==    by 0x40EAA319: PQconnectStart (fe-connect.c:273)
==14670==    by 0x40EAA2D5: PQconnectdb (fe-connect.c:235)
==14670==   Previous state: shared RO, no locks
accessed distinguished 2ndary map! 0x350034
read distinguished 2ndary map! 0x350034
Segmentatie fout (core dumped)



SOLVED: Multi-threaded Qt app segfaults when using libpq with separate connections

От
Frank van Vugt
Дата:
Hi,

I *finally* nailed this thing, turns out NOT to be a PG problem, more like 
PEBKC.

Tom Lane wrote:
> I get the impression that something is clobbering memory with abandon.

He was write (duh..).


What wasn't clear (at least to me) from the Qt-docs is that a name given to a 
supplementary database connection must be unique, otherwise the new one will 
replace the old one. Expecting to be able to handle the connections by the 
pointers an addDatabase() call returned, I had been creating new connections 
in separate threads whose names weren't always unique.

Since 'connection' here implicates a PGconn structure, I had been 'switching' 
PGconn structures while in use. It seems one can even get away with that for 
a limited number of threads, but at a certain point newly created result 
structures (PGresult, PQntubles, PQnfields, etc.) will be allocated space on 
top of the older structures belonging the replaced connections, which is 
obviously the point where things get very messy....

I've asked the Trolls (Qt) to emphasize this potential problem in their docs.


Thanks to Tom Lane (PostgreSQL) and Harald Fernengel (TrollTech) for their 
pointers ;)





Best,




Frank van Vugt