Undetected deadlock with pg_dump on hot_standby server
От | Oliver Seemann |
---|---|
Тема | Undetected deadlock with pg_dump on hot_standby server |
Дата | |
Msg-id | CANCipfoOA=P0zoykZY-XCLVvpKONm+aNVCOmSY96SE6SyhwecA@mail.gmail.com обсуждение исходный текст |
Список | pgsql-bugs |
Hi! We observe the following unexpected behaviour: We run pg_dump on the hot_standby slave every two hours and once in 2-3 weeks it just hangs forever and the slave stops replaying logs. Version: 9.3.11 OS: Debian GNU/Linux 7.9 (wheezy) GUC hot_standby_feedback is off. (See also attached pg_stat_activity_and_pg_locks.txt) Looking at pg_stat_activity on the slave we see pg_dump trying to acquire a lock, with state=active, waiting=t. Looking at pg_locks we see an entry for pg_dump with granted=f on relation 31353 and lots of other entries for the pg_dump pid (570) on other relations with granted=t. An AccessExclusiveLock on that relation 31353 is held by another pid (2033) with virtualtransaction 1/0. Pid 2033 is the postmaster process. strace says it's waiting in a loop: select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) GDB Backtrace: #0 0x00007f99d355edd3 in select () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f99d57a6c4e in pg_usleep () #2 0x00007f99d56836a0 in ?? () #3 0x00007f99d5683b43 in StandbyAcquireAccessExclusiveLock () #4 0x00007f99d5683f6c in standby_redo () #5 0x00007f99d54ed01d in StartupXLOG () #6 0x00007f99d565454f in StartupProcessMain () #7 0x00007f99d54f8d0d in AuxiliaryProcessMain () #8 0x00007f99d56512f3 in ?? () #9 0x00007f99d56537f7 in PostmasterMain () #10 0x00007f99d54886d0 in main () Looking at the disassembly and the registers it appears that StandbyAcquireAccessExclusiveLock() is called with arguments dbOid=30850, relOid=31262. Looking again at pg_locks, a lock on 31262 is already held by pg_dump! So it looks like a deadlock between pg_dump and standby_redo() in the postmaster that is not detected, making both wait indefinitely. We would expect the pg_dump session to die when a conflict is encountered, but the fact that it and the postmaster just hang forever makes it look like a bug. We do not call pg_xlog_replay_pause()/resume() around the pg_dump call, we'll try that now as a workaround. Oliver
Вложения
В списке pgsql-bugs по дате отправления: