Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
От | Thomas Munro |
---|---|
Тема | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Дата | |
Msg-id | CAEepm=1XGJVijxqG2EE=3Tb2bbrQRTvnXA6vZN1FkOZNtH=Lqw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Re: BUG #12990: Missing pg_multixact/members files
(appears to have wrapped, then truncated)
|
Список | pgsql-bugs |
On Fri, May 8, 2015 at 6:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > 1. The members SLRU is full all the way up to offsetStopLimit. > 2. A checkpoint occurs, reaching MultiXactSetSafeTruncate(), which > sets lastCheckpointedOldest. > 3. Vacuum runs, calling SetMultiXactIdLimit(), calling > DetermineSafeOldestOffset(), advancing > MultiXactState->offsetStopLimit. > 4. Since offsetStopLimit > lastCheckpointedOffset, it's now possible > for someone to consume an MXID greater than offsetStopLimit, making > MultiXactState->nextOffset > lastCheckpointedOffset > 5. The checkpoint from step 1, continuing on its merry way, now calls > TruncateMultiXact(), which sets rangeEnd > rangeStart and blows away > nearly every file in the SLRU. I am still working on reproducing this race scenario various different ways including the way you described, but at step 4 I kept getting stuck, unable to create new multixacts despite having vacuum-frozen all databases (including template0) and advanced the cluster minimum mxid. I think I see why, and I think it's a bug: if you vacuum freeze all your databases, MultiXactState->oldestMultiXactId finishes up equal to MultiXactState->nextMXact. But that's not actually a multixact that exists yet, so when when DetermineSafeOldestOffset calls find_multixact_start, it reads a garbage offset (all zeros in practice since pages start out zeroed) and produces a garbage value for offsetStopLimit which might incorrectly stop you from creating any more multixacts even though member space is entirely empty (but it depends on where your nextOffset happens to be at the time). I think the fix is something like "if nextMXact == oldestMultiXactId, then there are no active multixacts, so the offsetStopLimit should be set to nextOffset - (a segment's worth)". -- Thomas Munro http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления: