Re: Fwd: libpq: indefinite block on poll during network problems
От | Dmitry Samonenko |
---|---|
Тема | Re: Fwd: libpq: indefinite block on poll during network problems |
Дата | |
Msg-id | CAFKp+3cbU3s-V-HEUvg-n+Qx4G4kCD6=n8jxuvK1ORV6K_uayQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Fwd: libpq: indefinite block on poll during network problems (Adrian Klaver <adrian.klaver@aklaver.com>) |
Ответы |
Re: Fwd: libpq: indefinite block on poll during network
problems
|
Список | pgsql-general |
Guys, first of all: thank you for you help and cooperation. I have received several mails suggesting tweaks for tcp_keepalive and usage of postgresql server functions/features (cancel, statement timeout), but as I said - it won't help.
I have reproduced the problem scenario. Logs are attached. I walk you through.[root@krr2srv1wsn1 dtp_generator]# sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 5
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 10
This means that after 10 seconds of idle connection first TCP Keep-Alive probe is sent. If 3 probes with 5 second interval fail - connection should be considered dead.
== Part 1. TCP Keep Alive ==
At 11:25:35.847138 connection to the server is made and the first query is sent. Got response fast at 11:25:35.858582. No other queries were made for the next minute to catch keep alive packets. Wireshark 1.8.2 marks 13 - 36 frames as Keep-Alive, so we can see that it's configured right and definitely works.
== Part 2. The Problem ==
At 11:26:40.933017 queries generation is started on client side. Client is configured to perform 1 request per second. After some arbitrary time next command is executed on server node:
[root@cluster1]# date && iptables -A OUTPUT -p tcp --sport 5432 -j DROP && iptables -A INPUT -p tcp --dport 5432 -j DROP
[root@cluster1]# date && iptables -A OUTPUT -p tcp --sport 5432 -j DROP && iptables -A INPUT -p tcp --dport 5432 -j DROP
11:26:47 is outputed to console. As you can see in client trace file, this time corresponds to frame 55 - the last query is made. strace shows send && poll syscalls. And... that's it. The client got blocked on poll.
== Part 3. The aftermath ==
The Client was blocked ~2 minutes. I killed application with SIGTERM, which you can see in strace. At the time application was still waiting on libpq's poll. The Pcap file show no trace of keep-alive packets after server was isolated with iptable's rules. As I said earlier: TCP Keep-Alive is done on idle connection only. When TCP retransmission kicks-in - TCP Keep-Alive is not performed.
Let me repeat myself again: the problem is NOT with the server. The problem is with libpq's PGgetResult which ultimately leads to very optimistic poll routine.
Thank you.
With regards, Dmitry Samonenko.
Вложения
В списке pgsql-general по дате отправления: