Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
От | Robert Haas |
---|---|
Тема | Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Дата | |
Msg-id | CA+TgmoY5igs8i-B7BmQBeoJrK1U4b9Ym1SHzrjGaoUeC1aEHKg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Ответы |
Re: BUG #12990: Missing pg_multixact/members files (appears
to have wrapped, then truncated)
|
Список | pgsql-bugs |
On Thu, Apr 23, 2015 at 9:59 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Thomas Munro wrote: >> That's why I proposed not using xid-like logic, and instead using a >> type of three-way comparison that allows you to see when nextOffset >> would 'cross' oldestOffsetStopLimit, instead of the two-way comparison >> that considers half the number-space to be in the past and half in the >> future, in my earlier message. > > Yeah, that bit made sense to me. In addition to preventing the corruption, I think we also need a back-patchable fix for AV to try to keep this situation from happening in the first place. We do not want the system to get stuck in a situation where member usage is low, so autovacuum does nothing, but offset usage is high, so the user just keeps getting an error whenever they do anything that could cause a new MXID to be created. Perhaps if we'd thought of it we would have added yet another column to pg_class to track the oldest offset referenced by any MXID in a particular relation, but that sounds kind of icky on general principle and certainly wouldn't be back-patchable. What I think we should do is notice when members utilization exceeds offset utilization and progressively ramp back the effective value of autovacuum_multixact_freeze_max_age (and maybe also vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age) so that autovacuum (and maybe also manual vacuums) get progressively more aggressive about trying to advance relminmxid. Suppose we decide that when the "members" space is 75% used, we've got a big problem and want to treat autovacuum_multixact_freeze_max_age to effectively be zero. Conversely, suppose that when the members space is at a lower usage percentage than the offsets space, or when it's anyway less than autovacuum_multixact_freeze_max_age, we define that as completely acceptable. If we're somewhere above the "no problem" threshold but below the "big problem" threshold, then we calculate what percentage of the way we are from the "no problem" threshold to the "big problem" threshold and reduce autovacuum_multixact_freeze_max_age by that percentage. Example: autovacuum_multixact_freeze_max_age is 25% of 2^32 and we're using 20% of the offsets space but 40% of the members space. We're (40 - 20) / (75 - 20) = 36% of the way to the 75% threshold we never want to exceed, so we reduce the effective value of autovacuum_multixact_freeze_max_age by 36%. In this case, that means treating the configured value of 1073741824 as if it were 1073741824 * (35/55) = 683290251. Hopefully that's enough to trigger enough vacuuming to cause some relminmxid advancement, but if not, and the situation continues to worsen, we'll get more and more aggressive. This may not be the right proposal in detail, but I think we should do something. I don't like the idea telling users that they can no longer count on autovacuum to prevent wraparound, and that if they don't manually tune their vacuum settings correctly, their system may just stop working at some point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-bugs по дате отправления: