Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
От | Thomas Munro |
---|---|
Тема | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Дата | |
Msg-id | CAEepm=3C32VPJLOo45y0c3-3KWXNV2xM4jaPTSVjCRD2VG0Qgg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
|
Список | pgsql-bugs |
On Sat, May 9, 2015 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, May 8, 2015 at 9:55 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Thomas Munro wrote: >>> I think the fix is something like "if nextMXact == oldestMultiXactId, >>> then there are no active multixacts, so the offsetStopLimit should be >>> set to nextOffset - (a segment's worth)". >> >> Makes sense. > > Here's a patch that attempts to implement this. Thanks. I think I have managed to reproduce something like the data loss race that we were speculating about. 0. initdb, autovacuum = off, set up explode_mxact_members.c as described elsewhere in the thread. 1. Fill up the members SLRU completely (ie reach state where you can no longer create a new multixact of any size). pg_multixact/members contains 82040 files and the last one is named 14077. 2. Issue CHECKPOINT, but use a debugger to stop inside TruncateMultiXact after it has read MultiXactState->lastCheckpointedOldest and released the lock, but before it calls SlruScanDirectory to delete files... 3. Run VACUUM FREEZE in all databases (including template0). datminmxid moves. 4. Create lots of new multixacts. pg_multixact/members now contains 82041 files and the last one is named 14078 (ie one extra segment, with the highest possible segment number, which couldn't be created before vacuuming because of the one segment gap enforced by DetermineSafeOldestOffset). Segments 0000-0016 have new modified times. 5. ... allow the checkpoint started in step 2 to continue. It deletes segments, keeping only 0000-0016. The segment 14078 which contained active member data has been incorrectly deleted. -- Thomas Munro http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления: