Обсуждение: [BUGS] BUG #14588: Postgres process blocked on semop

Поиск
Список
Период
Сортировка

[BUGS] BUG #14588: Postgres process blocked on semop

От
diabloneo@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      14588
Logged by:          Chenhong Liu
Email address:      diabloneo@gmail.com
PostgreSQL version: 9.2.16
Operating system:   CentOS 7.2 1511
Description:

We run a distributed application which storing data in pgsql. This
application is designed to do some route works during 00:00 to 6:00 every
day. I found that, postgres processes are all blocked on semop function,
which means they are waiting for sysv semaphores. I'm not sure if this is a
bug or we misconfigured pgsql and OS. 

Techinque information:

30 hosts running the application.
3 hosts running the postgresql servers, one master and two host-stanby
servers.

pgsql config:
  max_connections = 1000
  shared_buffers = 512MB

OS kernel.sem
250 32000 32 128

During the daytime, there are about 300 postgres processes running, work as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese working
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has happened,
all new connection will stay in state authentication.



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14588: Postgres process blocked on semop

От
Neo Liu
Дата:
I add some log I collected for this problem. My them be helpful.

The attachements are two logs and one picture.

sds_20170311_1124.bug.log
It's log I collected every 5 minutes, it contains postgres processes list, ipcs -s output and ipcs -s -u output. I delete most of the log, just keep necessary pieces. 

ipcs.log 
It's the output of every semaphores set used by postgresql displayed with ipcs -s -i command, you can find out many process were waited for a semapthore. The number of lines which ncount == 1 is 469

The pstack screenshot
It shows two process's stack, one is pid 186397 who were in authentication, and the other is 188832 who where in INSERT.


Forgot to mention, I can only recovery from this situation by restart postgresql service. If I killed any process blocked with kill -9  command, the service will restart.



On Sun, Mar 12, 2017 at 11:57 AM <diabloneo@gmail.com> wrote:
The following bug has been logged on the website:

Bug reference:      14588
Logged by:          Chenhong Liu
Email address:      diabloneo@gmail.com
PostgreSQL version: 9.2.16
Operating system:   CentOS 7.2 1511
Description:

We run a distributed application which storing data in pgsql. This
application is designed to do some route works during 00:00 to 6:00 every
day. I found that, postgres processes are all blocked on semop function,
which means they are waiting for sysv semaphores. I'm not sure if this is a
bug or we misconfigured pgsql and OS.

Techinque information:

30 hosts running the application.
3 hosts running the postgresql servers, one master and two host-stanby
servers.

pgsql config:
  max_connections = 1000
  shared_buffers = 512MB

OS kernel.sem
250 32000 32 128

During the daytime, there are about 300 postgres processes running, work as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese working
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has happened,
all new connection will stay in state authentication.



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Вложения

Re: [BUGS] BUG #14588: Postgres process blocked on semop

От
Tom Lane
Дата:
diabloneo@gmail.com writes:
> During the daytime, there are about 300 postgres processes running, work as
> expected. On 00:00 each day, there will be about 200 more postgresql
> processes created, now totally about 500. And from that time, thoese working
> processes are all blocked on semop function. I check this using pstack
> command, and also check the output of ipcs -s [-i]. After this has happened,
> all new connection will stay in state authentication.

And your authentication setup is ...?

            regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14588: Postgres process blocked on semop

От
Neo Liu
Дата:
Content of my pg_hba.conf

local   all     all           peer
host all all 0.0.0.0/0 md5
host replication demon_replicator 0.0.0.0/0 md5


diabloneo

On Sun, Mar 12, 2017 at 12:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
diabloneo@gmail.com writes:
> During the daytime, there are about 300 postgres processes running, work as
> expected. On 00:00 each day, there will be about 200 more postgresql
> processes created, now totally about 500. And from that time, thoese working
> processes are all blocked on semop function. I check this using pstack
> command, and also check the output of ipcs -s [-i]. After this has happened,
> all new connection will stay in state authentication.

And your authentication setup is ...?

                        regards, tom lane

Re: [BUGS] BUG #14588: Postgres process blocked on semop

От
Tom Lane
Дата:
Neo Liu <diabloneo@gmail.com> writes:
> Content of my pg_hba.conf

> local   all     all           peer
> host all all 0.0.0.0/0 md5
> host replication demon_replicator 0.0.0.0/0 md5

Which of those would the stuck processes be using?

Also, can you attach to a few of the stuck processes and get stack traces?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

            regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14588: Postgres process blocked on semop

От
Andres Freund
Дата:
Hi,

On 2017-03-12 04:30:52 +0000, Neo Liu wrote:
> *The pstack screenshot*
> It shows two process's stack, one is pid 186397 who were in authentication,
> and the other is 188832 who where in INSERT.

This suggest you're having quite massive contention around
ProcArrayLock.  You should consider updating to 9.6. Several releases
since 9.2 considerably improved scalability around this (especially
9.6).

Regards,

Andres


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14588: Postgres process blocked on semop

От
Neo Liu
Дата:


On Mon, Mar 13, 2017 at 5:03 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-03-12 04:30:52 +0000, Neo Liu wrote:
> *The pstack screenshot*
> It shows two process's stack, one is pid 186397 who were in authentication,
> and the other is 188832 who where in INSERT.

This suggest you're having quite massive contention around
ProcArrayLock.  You should consider updating to 9.6. Several releases
since 9.2 considerably improved scalability around this (especially
9.6). 

Regards,

Andres

Thanks, Andres

I think upgrading to newest version is a good way, but I can't perform it on a production system before we doing sufficient testing.

Currently, I want to know if this is a bug of pgsql 9.2.16, and how can I avoid this situation.

Thanks, diabloneo

Re: [BUGS] BUG #14588: Postgres process blocked on semop

От
Neo Liu
Дата:


On Mon, Mar 13, 2017 at 12:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:


Which of those would the stuck processes be using?

Also, can you attach to a few of the stuck processes and get stack traces?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

                        regards, tom lane

Hi, Tome

I can't get stack traces for you now. We already add a cron job to restart the pgsql server every morning to solve the problem temporarily. It's a production system in client's environment, I can't login into the system now.

I hope the pstack output in early message can help you.

Thanks, diabloneo