Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
От | Bruce Momjian |
---|---|
Тема | Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts |
Дата | |
Msg-id | 20140620171738.GB29143@momjian.us обсуждение исходный текст |
Ответ на | Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Список | pgsql-bugs |
On Thu, Jun 19, 2014 at 06:04:25PM -0400, Alvaro Herrera wrote: > Bruce Momjian wrote: > > > I wasn't happy with having that delete code added there when we do > > directory delete in the function above. I instead broke apart the > > delete and copy code and called the delete code where needed, in the > > attached patch. > > Makes sense, yeah. I didn't look closely enough to realize that the > function that does the copying also does the rmtree part. OK. Should I apply my patch so at least pg_upgrade is good going forward? > I also now realize why the other case (upgrade from 9.3 to 9.4) does not > have a bug: we are already deleting the files in that path. Right, and I think the patch makes it clearer why we need those 'rm' function calls because they mirror the 'copy' ones. > > OK, so the xid has to be beyond 2^31 during pg_upgrade to trigger a > > problem? That might explain the rare reporting of this bug. What would > > the test query look like so we can tell people when to remove the '0000' > > files? Would we need to see the existence of '0000' and high-numbered > > files? How high? What does a 2^31 file look like? > > I misspoke. > > I ran a few more upgrades, and then tried vacuuming all databases, which > is when the truncate code is run. Say the original cluster had an > oldestmulti of 10 million. If you just run VACUUM in the new cluster > after the upgrade, the 0000 file is not deleted: it's not yet old enough > in terms of multixact age. An error is not thrown, because we're still > not attempting a truncate. But if you lower the > vacuum_multixact_freeze_table_age to 10 million minus one, then we will > try the deletion and that will raise the error. > > I think (didn't actually try) if you just let 150 million multixacts be > generated, that's the first time you will get the error. > > Now if you run a VACUUM FREEZE after the upgrade, the file will be > deleted with no error. > > I now think that the reason most people haven't hit the problem is that > they don't generate enough multis after upgrading a database that had > enough multis in the old database. This seems a bit curious OK, that does make more sense. A user would need to have the to exceeded 0000 to the point where it was removed from their old cluster, and _then_ run far enough past the freeze horizon to again require file removal. This does make sense why we are seeing the bug only now, and while a quick minor release with a query to fix this will get us out of the problem with minimal impact. > > Also, is there a reason you didn't remove the 'members/0000' file in your > > patch? I have removed it in my version. > > There's no point. That file is the starting point for new multis > anyway, and it's compatible with the new format (because it's all > zeroes). I think it should be done for consistency with the 'copy' function calls above. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
В списке pgsql-bugs по дате отправления: