Re: WAL recycling, ext3, Linux 2.4.18
От | Tom Lane |
---|---|
Тема | Re: WAL recycling, ext3, Linux 2.4.18 |
Дата | |
Msg-id | 18147.1026146435@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: WAL recycling, ext3, Linux 2.4.18 (Doug Fields <dfields-pg-general@pexicom.com>) |
Список | pgsql-general |
Doug Fields <dfields-pg-general@pexicom.com> writes: > Here is a stack trace. I did "where" about every second during the "pause" > and received the same stack trace. This is on PID 3456 per the > pg_stat_activity listing below. After things clear up, I also did a stack > trace; it's blocked on recv, presumably waiting for more commands to come > down the socket. (I tried a few other PIDs with similar stack traces, all > stuck on the semop call.) Hmm. I don't think I entirely believe that stack trace --- at least some of the claimed call paths are impossible. Would it be too much trouble to rebuild PG with --enable-debug and try again? Also, could you do the checkpoint manually and get a stack trace from that backend while others are hung up? I am considering the possibility that the other backends are hung trying to get ControlFileLock, which the checkpointer will acquire while recycling xlog file segments --- but if your stack trace is accurate and representative then that's not the problem because XLogInsert doesn't directly try to acquire ControlFileLock. In any case it's hard to credit that the recycling process could take 90 seconds to rename a dozen or so files. If you have a gdb attached to a process doing a manual checkpoint, it would be fairly easy to see how long MoveOfflineLogs() runs. (Set a breakpoint at its start, when control reaches the breakpoint issue "fin" and see how long it takes to come back.) regards, tom lane
В списке pgsql-general по дате отправления: