Heads-Up: multixact freezing bug
От | Andres Freund |
---|---|
Тема | Heads-Up: multixact freezing bug |
Дата | |
Msg-id | 20131128152853.GU31748@awork2.anarazel.de обсуждение исходный текст |
Ответы |
Re: Heads-Up: multixact freezing bug
|
Список | pgsql-hackers |
Hello, Investigating corruption in a client's database we unfortunately found another data corrupting bug that's relevant for 9.3+: Since 9.3 heap_tuple_needs_freeze() and heap_freeze_tuple() don't correctly handle the xids contained in a multixact. They separately do a check for their respective cutoffs but the xids contained in a multixact are not checked. That doesn't have too bad consequences for multixacts that lock only, but it can lead to errors like: ERROR: could not access status of transaction 3883960912 DETAIL: Could not open file "pg_clog/0E78": No such file or directory. when accessing a tuple. Thats because the update-xid contained in the multixact is lower than than the global datfrozenxid we've truncated the clog to. Unfortunately that scenario isn't too unlikely: We use vacuum_freeze_min_age as the basis for both, the cutoff for xid and mxid freezing. Since in many cases multis will be generated at a lower rate than xids, we often will not have frozen away all mxids containing xids lower than the xid cutoff. To recover the data there's the lucky behaviour that HeapTupleSatisfiesVacuum() sets the XMAX_INVALID hint bit if a updating multixact isn't running anymore. So assuming that a contained update xid outside ShmemVariableCache's [oldestXid, nextXid) committed will often not cause rows to spuriously disappear. I am working on a fix for the issue, but it's noticeably less simple than I initially thought. With the current WAL format the freezing logic needs to work the same during normal processing and recovery. My current thoughts are that we need to check whether any member of a multixact needs freezing. If we find one we do MultiXactIdIsRunning() && MultiXactIdWait() if!InRecovery. That's pretty unlikely to be necessary, but afaics we cannot guarntee it is not. During recovery we do *not* need to do so since the primary will have performed all necessary waits. The big problem with that solution is that we need to do a GetMultiXactIdMembers() during crash recovery which is pretty damn ugly. But I *think*, and that's where I really would like some input, given the way multixact WAL logging works that should be safe. I am not really sure what to do about this. It's quite likely to cause corruption, but the next point release is coming up way too fast for a nontrivial fix. Thoughts? Better ideas? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: