Re: BUG #9190: Could not fsync file "pg_clog/0000": Bad file descriptor.
От | Alvaro Herrera |
---|---|
Тема | Re: BUG #9190: Could not fsync file "pg_clog/0000": Bad file descriptor. |
Дата | |
Msg-id | 20140212141607.GJ6342@eldon.alvh.no-ip.org обсуждение исходный текст |
Ответ на | BUG #9190: Could not fsync file "pg_clog/0000": Bad file descriptor. (dvitek@grammatech.com) |
Список | pgsql-bugs |
dvitek@grammatech.com wrote: > We had a postgres panic a few weeks ago. Here is a relevant fragment of the > postgres log: > > [2014-01-27 04:57:37 EST 3756] WARNING: pgstat wait timeout > ... > ... > ... > [2014-01-27 04:55:36 EST 5804] ERROR: could not access status of > transaction 0 > [2014-01-27 04:55:42 EST 5804] DETAIL: Could not fsync file "pg_clog/0000": > Bad file descriptor. I noticed that SimpleLruFlush calls SlruInternalWritePage() to write all pages, and stores the file descriptors in fdata, with the intention of fsyncing the files later; SlruInternalWritePage in turn calls SlruPhysicalWritePage. If the physical write fails, SlruInternalWritePage will dutifully close all the files, *but fdata is not updated to remove the file descriptors*. This might lead to the "bad file descriptor" error (but see below). Really, what we should be reporting is the failure to do the writes, I think. There is something broken about this system that makes the writes fail (something which can probably be learnt about in the kernel log, if there is such a thing on Windows), but this part seems our bug. The "status of transaction 0" part of the error message should surprise no one, since InvalidTransactionId is what SimpleLruFlush uses in its failure report. This does nothing to explain or help with the PANIC, however; nor why things seem to have continued running after a PANIC for two minutes. > [2014-01-27 09:21:04 EST 3080] PANIC: could not fsync file "pg_xlog/xlogtemp.3080": Bad file descriptor > [2014-01-27 09:23:01 EST 5404] LOG: WAL writer process (PID 3080) exited with exit code 3 > [2014-01-27 09:23:07 EST 5404] LOG: terminating any other active server processes There is no obvious path in which an fd is clobbered in xlog.c that I can see. If there is an explanation for this failure at the filesystem level, perhaps that can explain the above problem as well. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-bugs по дате отправления: