Re: txid failed epoch increment, again, aka 6291
От | Noah Misch |
---|---|
Тема | Re: txid failed epoch increment, again, aka 6291 |
Дата | |
Msg-id | 20120906100406.GA2399@tornado.leadboat.com обсуждение исходный текст |
Ответ на | Re: txid failed epoch increment, again, aka 6291 (Daniel Farina <daniel@heroku.com>) |
Ответы |
Re: txid failed epoch increment, again, aka 6291
|
Список | pgsql-hackers |
On Tue, Sep 04, 2012 at 09:46:58AM -0700, Daniel Farina wrote: > I might try to find the segments leading up to the overflow point and > try xlogdumping them to see what we can see. That would be helpful to see. Just to grasp at yet-flimsier straws, could you post (URL preferred, else private mail) the output of "objdump -dS" on your "postgres" executable? > If there's anything to note about the workload, I'd say that it does > tend to make fairly pervasive use of long running transactions which > can span probably more than one checkpoint, and the txid reporting > functions, and a concurrency level of about 300 or so backends ... but > per my reading of the mechanism so far, it doesn't seem like any of > this should matter. Thanks for the details; I agree none of that sounds suspicious. After some further pondering and testing, this remains a mystery to me. These symptoms imply a proper update of ControlFile->checkPointCopy.nextXid without having properly updated ControlFile->checkPointCopy.nextXidEpoch. After recovery, only CreateCheckPoint() updates ControlFile->checkPointCopy at all. Its logic for doing so looks simple and correct.
В списке pgsql-hackers по дате отправления: