Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
От | Robert Haas |
---|---|
Тема | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Дата | |
Msg-id | 8486B09E-773B-4838-A7E8-8E48433245E1@gmail.com обсуждение исходный текст |
Ответ на | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: Re: BUG #12990: Missing pg_multixact/members files
(appears to have wrapped, then truncated)
|
Список | pgsql-bugs |
On May 9, 2015, at 8:00 AM, Thomas Munro <thomas.munro@enterprisedb.com> wro= te: >> On Sat, May 9, 2015 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote= : >>> On Fri, May 8, 2015 at 9:55 PM, Alvaro Herrera <alvherre@2ndquadrant.com= > wrote: >>> Thomas Munro wrote: >>>> I think the fix is something like "if nextMXact =3D=3D oldestMultiXactI= d, >>>> then there are no active multixacts, so the offsetStopLimit should be >>>> set to nextOffset - (a segment's worth)". >>>=20 >>> Makes sense. >>=20 >> Here's a patch that attempts to implement this. >=20 > Thanks. I think I have managed to reproduce something like the data > loss race that we were speculating about. >=20 > 0. initdb, autovacuum =3D off, set up explode_mxact_members.c as > described elsewhere in the thread. > 1. Fill up the members SLRU completely (ie reach state where you can > no longer create a new multixact of any size). pg_multixact/members > contains 82040 files and the last one is named 14077. > 2. Issue CHECKPOINT, but use a debugger to stop inside > TruncateMultiXact after it has read > MultiXactState->lastCheckpointedOldest and released the lock, but > before it calls SlruScanDirectory to delete files... > 3. Run VACUUM FREEZE in all databases (including template0). datminmxid m= oves. > 4. Create lots of new multixacts. pg_multixact/members now contains > 82041 files and the last one is named 14078 (ie one extra segment, > with the highest possible segment number, which couldn't be created > before vacuuming because of the one segment gap enforced by > DetermineSafeOldestOffset). Segments 0000-0016 have new modified > times. > 5. ... allow the checkpoint started in step 2 to continue. It > deletes segments, keeping only 0000-0016. The segment 14078 which > contained active member data has been incorrectly deleted. OK. So the next question is: if you then apply the other patch, does that pr= event step 4 and thereby avoid catastrophe? ...Robert=
В списке pgsql-bugs по дате отправления: