Обсуждение: Lyris looking to help fix PostgresSQL crashing problems

Поиск
Список
Период
Сортировка

Lyris looking to help fix PostgresSQL crashing problems

От
John Buckman
Дата:
Hello -- I'm the lead programmer of Lyris ListManager, an email list server that run on PostgreSQL, Oracle, and MS/SQL.

About 20% of our client base of 4000 runs on PostgresSQL -- it's very popular with our clients -- much more than Oracle
is(about 3%). 

Unfortunately we have about a dozen clients who have stability problems with PostgresSQL. This week a major television
networkcancelled their order with us due to their PostgresSQL stability issues, which is what prompted me to write this
emailand get involved with the PostgresSQL community.  

It seems that with larger database sizes (500,000 rows and larger) and high stress, the server daemon has a tendency to
core.We've also had cases where a single connection doing a million inserts into a table will cause the daemon to core.
We'veseen problems with both 7.1 and 7.2.x, with built-on-the-machine and with RPMs.  We've also had big stability
problemswith Solaris 8/Sparc, and don't ship on that platform because of that. 

What I'd like to do is help solve these problems in the core distribution, so that PostrgesSQL can indeed be able to
handlethe large databases and high transaction loads that Microsoft SQL can. 

My company has hired open source people before to help fix bugs or add features to open source projects, most notable
fromthe Tcl community, as we use Tcl quite a bit (we have two programmers from the Tcl Core team working here).  This
worksout well for the Tcl community, as we fund the development of the project, as well as pay someone to work on
somethingthey want to work on anyhow. 

So... what I'm looking for are recommendations on a PostgresSQL guru who could help nail the stability/load issues, and
makesure that the fixes make their way back into the PostgresSQL core.  What I'd prefer is to get a regular contributor
tothis list, so that this person could investigate our problems, and then get the community's help in solving them. 

Thanks!

-john


Re: Lyris looking to help fix PostgresSQL crashing problems

От
Tom Lane
Дата:
John Buckman <john@lyris.com> writes:
> It seems that with larger database sizes (500,000 rows and larger) and
> high stress, the server daemon has a tendency to core.

We'd love to see some stack traces ...
        regards, tom lane


Re: Lyris looking to help fix PostgresSQL crashing problems

От
John Buckman
Дата:
> John Buckman <john@lyris.com> writes:
> > It seems that with larger database sizes (500,000 rows and larger) and
> > high stress, the server daemon has a tendency to core.

> We'd love to see some stack traces ...

Yeah, I just didn't know what form this list prefers to work on things, which is why I'd prefer to hire a regular
participantof this list.  If gcc 'where' stack traces are what you want, we can do that.   

I suspect that the problems may be platform-or-build related, because we've often had trouble replicating customer
problemson our own sysems. For example, we had many reports of problems with 7.2.x, and saw it crash often on a
customer'sredhat machine that we had ssh access to, but couldn't make it crash in our own lab. :(  That's why we need
help. If we could make a simple C test case that crashed pgsql, I'm sure you guys could fix the problem in a jiffy. 

-john


Re: Lyris looking to help fix PostgresSQL crashing problems

От
Bruce Momjian
Дата:
John Buckman wrote:
> > John Buckman <john@lyris.com> writes:
> > > It seems that with larger database sizes (500,000 rows and larger) and
> > > high stress, the server daemon has a tendency to core.
> 
> > We'd love to see some stack traces ...
> 
> Yeah, I just didn't know what form this list prefers to work on
> things, which is why I'd prefer to hire a regular participant
> of this list.  If gcc 'where' stack traces are what you want,
> we can do that.

Yep, in most cases, the crash creates a core file in the database
directory.  A backtrace of that core file is usually a good start.  You
should to sure there are debugging symbols in the binary (gcc -g).

The server log files also often contain valuable information.

> I suspect that the problems may be platform-or-build related,
> because we've often had trouble replicating customer problems
> on our own systems. For example, we had many reports of problems
> with 7.2.x, and saw it crash often on a customer's redhat machine
> that we had ssh access to, but couldn't make it crash in our
> own lab. :(  That's why we need help.  If we could make a simple
> C test case that crashed pgsql, I'm sure you guys could fix the
> problem in a jiffy.

Yes, that does make it harder, but a backtrace usually gets us started. 
It may also be tickling some OS bug or a hardware failure, or a simple
exhaustion of some resource.

-- Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Lyris looking to help fix PostgresSQL crashing problems

От
John Buckman
Дата:
> John Buckman <john@lyris.com> writes:
> > It seems that with larger database sizes (500,000 rows and larger) and
> > high stress, the server daemon has a tendency to core.

> We'd love to see some stack traces ...

Yeah, I just didn't know what form this list prefers in terms of info to be able to work on things, which is why I'd
preferto hire a regular participant of this list.  If gcc 'where' stack traces from core files are what you want, we
cando that.   

I suspect that the problems may be platform-or-build related, because we've often had trouble replicating customer
problemson our own sysems. For example, we had many reports of problems with 7.2.x, and saw it crash often on a
customer'sredhat machine that we had ssh access to, but couldn't make it crash in our own lab. :(  That's why we need
help. If we could make a simple C test case that crashed pgsql, I'm sure you guys could fix the problem in a jiffym but
localizingand recreating a problem is always 80% of it. 

-john