Обсуждение: connection with the ha-availability software 'repmgr' broke down for idle
Hi Everyone!
I tested on pg15 and pg16 and it won't work out on either.tcp_keepalives_idle = 20 # TCP_KEEPIDLE, in seconds;
# 0 selects the system default
tcp_keepalives_interval = 10 # TCP_KEEPINTVL, in seconds;
# 0 selects the system default
tcp_keepalives_count = 3 # TCP_KEEPCNT;
It basically echo's ping-pong heartbeat with the client to avoid a router in between them considers the tcp connection as idle and automatically closes it.
But unfortunately I encountered the loss of connections between a repmgr with local postgresql (primary) server
2023-11-01 06:11:41+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-11-01 06:11:57.750148+08: repmgrd_local_reconnect on node2, reconnected to local node after 16 seconds - happened
2023-11-01 06:11:30+0800: repmgrd_upstream_disconnect on node1, unable to connect to upstream node "yzx2" (ID: 2) - happened
2023-11-01 06:12:00.688529+08: repmgrd_upstream_reconnect on node1, reconnected to upstream node after 30 seconds - happened
2023-11-01 08:05:29+0800: repmgrd_upstream_disconnect on node1, unable to connect to upstream node "yzx2" (ID: 2) - happened
2023-11-01 08:06:00.559327+08: repmgrd_upstream_reconnect on node1, reconnected to upstream node after 30 seconds - happened
2023-11-01 11:22:54+0800: repmgrd_upstream_disconnect on node1, unable to connect to upstream node "yzx2" (ID: 2) - happened
2023-11-01 11:22:56.708542+08: repmgrd_upstream_reconnect on node1, reconnected to upstream node after 2 seconds - happened
2023-11-01 12:30:54+0800: repmgrd_upstream_disconnect on node1, unable to connect to upstream node "yzx2" (ID: 2) - happened
2023-11-01 12:31:04.648273+08: repmgrd_upstream_reconnect on node1, reconnected to upstream node after 10 seconds - happened
2023-11-01 06:12:00.688529+08: repmgrd_upstream_reconnect on node1, reconnected to upstream node after 30 seconds - happened
2023-11-01 08:05:29+0800: repmgrd_upstream_disconnect on node1, unable to connect to upstream node "yzx2" (ID: 2) - happened
2023-11-01 08:06:00.559327+08: repmgrd_upstream_reconnect on node1, reconnected to upstream node after 30 seconds - happened
2023-11-01 11:22:54+0800: repmgrd_upstream_disconnect on node1, unable to connect to upstream node "yzx2" (ID: 2) - happened
2023-11-01 11:22:56.708542+08: repmgrd_upstream_reconnect on node1, reconnected to upstream node after 2 seconds - happened
2023-11-01 12:30:54+0800: repmgrd_upstream_disconnect on node1, unable to connect to upstream node "yzx2" (ID: 2) - happened
2023-11-01 12:31:04.648273+08: repmgrd_upstream_reconnect on node1, reconnected to upstream node after 10 seconds - happened
They left no record in the log of postgresql
It does not matter what type of alive checking in repmgr.conf I set:
connection_check_type=ping|connection|query
I found that after long time, the connection between repmgr and postgresql would be labeled as 'idle' from htop of ps -ef
So, do you have an idea on how this accident occurs? Thanks in advance.
Zhaoxun
Вложения
I have muted the tcp_keep_alives options but it still continues to disconnect 'idle' connections and reconnect now and then.
#tcp_keepalives_idle = 20 # TCP_KEEPIDLE, in seconds;
# 0 selects the system default
#tcp_keepalives_interval = 10 # TCP_KEEPINTVL, in seconds;
# 0 selects the system default
#tcp_keepalives_count = 3 # TCP_KEEPCNT;
# 0 selects the system default
#tcp_keepalives_interval = 10 # TCP_KEEPINTVL, in seconds;
# 0 selects the system default
#tcp_keepalives_count = 3 # TCP_KEEPCNT;