Обсуждение: BUG #5465: dblink TCP connection hangs blocking translation from being terminated

Поиск
Список
Период
Сортировка

BUG #5465: dblink TCP connection hangs blocking translation from being terminated

От
"Valentine Gogichashvili"
Дата:
The following bug has been logged online:

Bug reference:      5465
Logged by:          Valentine Gogichashvili
Email address:      valgog@gmail.com
PostgreSQL version: 8.2.1
Operating system:   Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp)
Description:        dblink TCP connection hangs blocking translation from
being terminated
Details:

Hi all,

we have an issue on our productive server. A stored procedure, that uses
dblink to get some data from the remote database hangs not responding to
kill signal and holds several locks on some tables as well as an advisory
lock. So I have this transaction to be completed in order to have a
possibility to operate the database normally.

It was exactly on the time, that the procedure was accessing remote
database, the machine hosting this remote database had a panic attack and
rebooted. But the ESTABLISHED connection is still hanging on the production
database machine:

$ netstat | grep remote_db_host
tcp 0 0 production_db_host:60248 remote_db_host:postgres ESTABLISHED

$ lsof | grep remote_db_host
postgres 1365 postgres 199u IPv4 23003779784 TCP
production_db_host:60248->remote_db_host:postgres (ESTABLISHED)

On the database session list one can see the hanging transaction:

production_db=# select procpid, now() - query_start as running, waiting,
substr(current_query,1,120) as current_query from pg_stat_activity where
current_query not like '%----STATQ-----%' and current_query != '<IDLE>'
order by query_start desc;
 procpid |        running         | waiting |
                                 current_query
---------+------------------------+---------+-------------------------------
----------------------------------------------------------------------------
------------------------------------
    1365 | 2 days 00:17:57.992004 | f       | SELECT * FROM
get_remote_data()

It seems like the dblink is waiting for the connection to be closed or
reseted and also makes the hole transaction hang not processing kill
signals.

Does the dblink TCP connection have any timeout?

How would it be possible to shutdown the DB in case this session process is
not responding to normal kill signals? Will it hinder the database from
shutting down normally? My previous experience with issuing immediate stops
or killing with -9 had been quite catastrophic and I could not start the DB
afterwards. What would you suggest in this case?

With best regards,

-- Valentine Gogichashvili

Re: BUG #5465: dblink TCP connection hangs blocking translation from being terminated

От
Magnus Hagander
Дата:
On Wed, May 19, 2010 at 5:10 AM, Valentine Gogichashvili
<valgog@gmail.com> wrote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A05465
> Logged by: =A0 =A0 =A0 =A0 =A0Valentine Gogichashvili
> Email address: =A0 =A0 =A0valgog@gmail.com
> PostgreSQL version: 8.2.1
> Operating system: =A0 Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp)
> Description: =A0 =A0 =A0 =A0dblink TCP connection hangs blocking translat=
ion from
> being terminated
> Details:
>
> Hi all,
>
> we have an issue on our productive server. A stored procedure, that uses
> dblink to get some data from the remote database hangs not responding to
> kill signal and holds several locks on some tables as well as an advisory
> lock. So I have this transaction to be completed in order to have a
> possibility to operate the database normally.

I believe this is a known issue in dblink, where it's not possible to
cancel it when it's waiting in the TCP layer in the kernel.
Unfortunately, there is no fix ATM - there was some work towards it
for 9.0 at one point, but I think this is actually the first real
bug-report on the issue...


> It seems like the dblink is waiting for the connection to be closed or
> reseted and also makes the hole transaction hang not processing kill
> signals.
>
> Does the dblink TCP connection have any timeout?

It does not. But it would detect a conneciton that goes away, so TCP
keepalives should be able to deal with this problem. Once the kernel
notices the other end is gone, dblink should notice it and roll back.


> How would it be possible to shutdown the DB in case this session process =
is
> not responding to normal kill signals? Will it hinder the database from
> shutting down normally? My previous experience with issuing immediate sto=
ps
> or killing with -9 had been quite catastrophic and I could not start the =
DB
> afterwards. What would you suggest in this case?

kill -9 on a client will make the postmaster restart the whole
process, so yes, it's a very heavy operation.

--=20
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: BUG #5465: dblink TCP connection hangs blocking translation from being terminated

От
Joseph Conway
Дата:
Magnus Hagander wrote:
> On Wed, May 19, 2010 at 5:10 AM, Valentine Gogichashvili
> <valgog@gmail.com> wrote:
>> The following bug has been logged online:
>>
>> Bug reference:      5465
>> Logged by:          Valentine Gogichashvili
>> Email address:      valgog@gmail.com
>> PostgreSQL version: 8.2.1
>> Operating system:   Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp)
>> Description:        dblink TCP connection hangs blocking translation from
>> being terminated
>> Details:
>>
>> Hi all,
>>
>> we have an issue on our productive server. A stored procedure, that uses
>> dblink to get some data from the remote database hangs not responding to
>> kill signal and holds several locks on some tables as well as an advisory
>> lock. So I have this transaction to be completed in order to have a
>> possibility to operate the database normally.
>
> I believe this is a known issue in dblink, where it's not possible to
> cancel it when it's waiting in the TCP layer in the kernel.
> Unfortunately, there is no fix ATM - there was some work towards it
> for 9.0 at one point, but I think this is actually the first real
> bug-report on the issue...

I thought the known issue was only on Windows though...
Note that this is not dblink specific but rather libpq.

>> How would it be possible to shutdown the DB in case this session process is
>> not responding to normal kill signals? Will it hinder the database from
>> shutting down normally? My previous experience with issuing immediate stops
>> or killing with -9 had been quite catastrophic and I could not start the DB
>> afterwards. What would you suggest in this case?
>
> kill -9 on a client will make the postmaster restart the whole
> process, so yes, it's a very heavy operation.

Can you grab the process with gdb and call elog() manually?

Joe
Oh, found an type in the subject. Transaction, not Translation.
On May 19, 8:41=A0pm, m...@joeconway.com (Joseph Conway) wrote:
> Magnus Hagander wrote:
> > On Wed, May 19, 2010 at 5:10 AM, Valentine Gogichashvili
> > <val...@gmail.com> wrote:
> >> The following bug has been logged online:
>
> >> Bug reference: =A0 =A0 =A05465
> >> Logged by: =A0 =A0 =A0 =A0 =A0Valentine Gogichashvili
> >> Email address: =A0 =A0 =A0val...@gmail.com
> >> PostgreSQL version: 8.2.1
> >> Operating system: =A0 Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp)
> >> Description: =A0 =A0 =A0 =A0dblink TCP connection hangs blocking trans=
lation from
> >> being terminated
> >> Details:
>
> >> Hi all,
>
> >> we have an issue on our productive server. A stored procedure, that us=
es
> >> dblink to get some data from the remote database hangs not responding =
to
> >> kill signal and holds several locks on some tables as well as an advis=
ory
> >> lock. So I have this transaction to be completed in order to have a
> >> possibility to operate the database normally.
>
> > I believe this is a known issue in dblink, where it's not possible to
> > cancel it when it's waiting in the TCP layer in the kernel.
> > Unfortunately, there is no fix ATM - there was some work towards it
> > for 9.0 at one point, but I think this is actually the first real
> > bug-report on the issue...
>
> I thought the known issue was only on Windows though...
> Note that this is not dblink specific but rather libpq.
>
> >> How would it be possible to shutdown the DB in case this session proce=
ss is
> >> not responding to normal kill signals? Will it hinder the database from
> >> shutting down normally? My previous experience with issuing immediate =
stops
> >> or killing with -9 had been quite catastrophic and I could not start t=
he DB
> >> afterwards. What would you suggest in this case?
>
> > kill -9 on a client will make the postmaster restart the whole
> > process, so yes, it's a very heavy operation.
>
> Can you grab the process with gdb and call elog() manually?
>
> Joe
>
> --
> Sent via pgsql-bugs mailing list (pgsql-b...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/p=
gsql-bugs

Unfortunately I could not install gdb on that machine :-( some
dependencies are not installable and I cannot upgrade that production
machine...

-- Valentine