Re: Incomplete freezing when truncating a relation during vacuum
От | Andres Freund |
---|---|
Тема | Re: Incomplete freezing when truncating a relation during vacuum |
Дата | |
Msg-id | 20131130235858.GK31100@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Re: Incomplete freezing when truncating a relation during vacuum (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Incomplete freezing when truncating a relation during vacuum
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Incomplete freezing when truncating a relation during vacuum (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On 2013-11-30 11:50:57 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > FWIW, I think the two other data corrupting bugs, "incomplete freezing > > due to truncation" (all branches) and freezing overall (in 9.3), are at > > least as bad because they take effect on the primary. > > Not saying that because of my involvement, but because I think they need > > to be presented at least as prominent in the release notes. > > They bugs themselves are all fixed in the relevant branches, but I do > > think we need to talk about about how to detect and fix possible > > corruption. > > I was planning to draft up the release notes today. Can you propose > text about the above? * Fix possible data corruptions due to incomplete vacuuming (Andres Freund, Heikki Linnakangas) Due to this bug (auto-)vacuum could sometimes treat a partial vacuum as a full table vacuum mistakenly increasing relfrozenxid as a result. This could happen if it managed to truncate the tail end of the table due to dead space. Possible consequences are: * Errors like "could not access status of transaction XXX" when accessing such rows. * Vanishing rows after more than 2^31 transactions have passed. Tables in which parts only changing infrequently, while others change heavily are more likely to be affected. It is recommended to perform a VACUUM of all tables in all databases while having vacuum_freeze_table_age set to zero. This will fix latent corruption but will not be able to fix all errors. To detect whether a database is possibly affected check wether either "SELECT txid_current() < 2^31" returns 'f' or a VACUUM of all tables with vacuum_freeze_table_age set to zero returns errors. If neither are the case, the database is safe after performing the VACUUM. If you think you are suffering from this corruption, please contact the pgsql-hackers mailing list or your service provider, data is likely to be recoverable. Users updating from 9.0.4/8.4.8 or earlier are not affected. All branches. Commit: 82b43f7df2036d06b4410721f77512969846b6d0 * Fix possible data corruptions due to several bugs around vacuuming [in the 9.3 foreign key locks feature] (Andres Freund,Alvaro Herrera) The VACUUM implementation in 9.3 had several bugs: It removed multixact xmax values without regard of the importance of contained xids, it did not remove multixacts if the contained xids were too old and it relied on hint bits when checking whether a row needed to be frozen which might not have been set on replicas. It is unlikely that databases on a primary are affected in which no VACUUM FREEZE or a VACUUM with a nondefault vacuum_freeze_min_age was ever executed and in which SELECT relminmxid FROM pg_class WHERE relkind = 'r' AND NOT oid = 1262 AND NOT relminmxid = 1 returns no rows. Possible consequences are: * Duplicated or vanishing rows. * Errors like "could not access status of transaction XXX" when accessing rows. * Primary and Standby servers getting out of sync It is strongly recommended to re-clone all standby servers after ugprading, especially if full_page_writes was set to false. On the primary it recommented to execute a VACUUM of all tables in all databases after upgrading both the primary and possibly existing standbys while having vacuum_freeze_table_age set to zero. This will fix latent corruption on primaries but will not be able to fix all pre-existing errors. If you think you are suffering from data loss due this corruption on the primary, please contact the pgsql-hackers mailing list or your service provider, some data might be recoverable. 9.3 only, but should be mentioned first as corruption due to this is quite likely. Commit: 2393c7d102368717283d7121a6ea8164e902b011 I had quite a hard time - likely noticeable - to summarize the second time in easy to understand terms. The interactions are quite complicated. We could tell users they don't necessarily need to re-clone standbys if no xids have been truncated away (txid_current() < 2^31, and datfrozenxid of at least one database = 1), but given the replication issue that seems like unneccessary confusion. Questions? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Peter GeogheganДата:
Сообщение: Re: [PATCH] Report exit code from external recovery commands properly