Re: SR standby hangs
От | Andrew Dunstan |
---|---|
Тема | Re: SR standby hangs |
Дата | |
Msg-id | 4DB724A2.9040809@dunslane.net обсуждение исходный текст |
Ответ на | Re: SR standby hangs (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: SR standby hangs
|
Список | pgsql-hackers |
On 02/22/2011 07:55 AM, Robert Haas wrote: > On Sun, Feb 20, 2011 at 12:39 PM, Greg Stark<gsstark@mit.edu> wrote: >> On Fri, Feb 18, 2011 at 6:59 PM, Andrew Dunstan<amdunstan@nc.rr.com> wrote: >>> The server is running as a warm standby, and the client's application tries >>> to connect to both the master and the slave, accepting whichever lets it >>> connect (hence hot standby is not turned on). >>> ... >>> #2 0x00000000005de645 in LockBufferForCleanup () at bufmgr.c:2432 >>> #3 0x0000000000463733 in heap_xlog_clean (lsn=<value optimized out>, >> Hm, if hot standby isn't turned on why are we bothering with locks at >> all? But that said is there *anything* else happening in the database >> aside from recovery? Are there any errors in the database log? >> >> But this still shouldn't block. It's either blocked locking the buffer >> or blocked waiting for the buffer to become unpinned. It would be nice >> to get a backtrace from a debugging build which wouldn't have some of >> the functions inlined. It would be really nice to see the pin count on >> the buffer in question -- perhaps it has gotten out of sync or >> underflowed? > I realize now that it would also be useful to see the state of the > LWLock on the buffer. This has happened again. This time we have some debug info available, and can possible get more, if people tell me what will be helpful: (gdb) bt #0 0x0000003493ed5337 in semop () from /lib64/libc.so.6 #1 0x00000000005bd1e3 in PGSemaphoreLock (sema=0x2b1a6014ac38, interruptOK=1 '\001') at pg_sema.c:420 #2 0x00000000005de735 in LockBufferForCleanup (buffer=310163)at bufmgr.c:2432 #3 0x000000000048b896 in RestoreBkpBlocks (lsn=<value optimized out>, record=0x2b1a6191b010,cleanup=1 '\001') at xlog.c:3540 #4 0x0000000000463788 in heap_xlog_clean (lsn=<value optimized out>, record=0x2b1a6191b010) at heapam.c:4160 #5 heap2_redo (lsn=<value optimized out>, record=0x2b1a6191b010)at heapam.c:4858 (gdb) f 2 #2 0x00000000005de735 in LockBufferForCleanup (buffer=310163) at bufmgr.c:2432 2432 ProcWaitForSignal(); (gdb) p *bufHdr $2 = {tag = {rnode = {spcNode = 16393, dbNode= 40475, relNode = 41880}, forkNum = MAIN_FORKNUM, blockNum = 18913}, flags = 6, usage_count = 1, refcount = 1,wait_backend_pid = 9111, buf_hdr_lock = 0 '\000', buf_id = 310162, freeNext = -2, io_in_progress_lock = 620448,content_lock = 620449} Strangely, I don't even see ProcWaitForSignal() in the frame list - I shouldn't have thought it was a candidate to be optimised away. cheers andrew
В списке pgsql-hackers по дате отправления: