On 12/29/10 6:28 AM, Julian v. Bock wrote:
> I have the problem that on our servers it happens regularly under a
> certain workload (several times per minute) that all backend processes
> get a SIGUSR1 and spend several seconds in ProcessCatchupEvent(). At
> 100-200 connections (most of them idle) this causes the system load to
> skyrocket. I am not really familiar with the code but my wild guess is
> that the processes spend most of their time waiting for spinlocks.
>
> We have reduced the number of connections as much as possible for now
> but it still makes up for roughly 50% of the total CPU time. Has
> anyone experienced a similar problem?
>
> I can reproduce the issue on a test system with production data but it
> is not so easy to pinpoint what exactly causes the problem. The queries
> are basically tsearch2 full text searches over moderately big tables
> (~35GB). The queries are performed by functions which aggregate data
> from partitions in temporary tables, cache some data, and perform
> calculations before returning it to the user.
>
> The PostgreSQL version is 8.3.12, the test server has 8 amd64 cores
> and 16GB of ram. I experimented with shared_buffers between 1GB and
> 4GB but it doesn't make much of a difference. Disk IO doesn't seem to
> be an issue here.
This sounds like the exact same problem I had on Postgres 8.3 and 8.4:
http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php
Updating to Postgres version 9 fixed it. Here is what appeared to be the best analysis of what was happening, but we
neverconfirmed it.
http://archives.postgresql.org/pgsql-performance/2010-06/msg00464.php
Craig