Re: pg_upgrade can result in early wraparound on databases with hightransaction load
От | Noah Misch |
---|---|
Тема | Re: pg_upgrade can result in early wraparound on databases with hightransaction load |
Дата | |
Msg-id | 20190615183759.GB239428@rfd.leadboat.com обсуждение исходный текст |
Ответ на | Re: pg_upgrade can result in early wraparound on databases with hightransaction load (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: pg_upgrade can result in early wraparound on databases with high transaction load
|
Список | pgsql-hackers |
On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote: > On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason@reddit.com> wrote: > > This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. Abouttwo days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of ouroldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound errorreferred to a mysterious database of `OID 0`: > > > > UPDATE ERROR: database is not accepting commands to avoid wraparound data loss in database with OID 0 That's bad. > > We were able to recover after a few hours by greatly speeding up our vacuum on our largest table. For what it's worth, a quicker workaround is to VACUUM FREEZE any database, however small. That forces a vac_truncate_clog(), which recomputes the wrap point from pg_database.datfrozenxid values. This demonstrates the workaround: --- a/src/bin/pg_upgrade/test.sh +++ b/src/bin/pg_upgrade/test.sh @@ -248,7 +248,10 @@ case $testhost in esac pg_dumpall --no-sync -f "$temp_root"/dump2.sql || pg_dumpall2_status=$? +pg_controldata "${PGDATA}" +vacuumdb -F template1 pg_ctl -m fast stop +pg_controldata "${PGDATA}" if [ -n "$pg_dumpall2_status" ]; then echo "pg_dumpall of post-upgrade database cluster failed" > > In a followup investigation I uncovered the reason we hit the wraparound so early, and also the cause of the mysteriousOID 0 message. When pg_upgrade executes, it calls pg_resetwal to set the next transaction ID. Within pg_resetwalis the following code: https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450 pg_upgrade should set oldestXID to the same value as the source cluster or set it like vac_truncate_clog() would set it. Today's scheme is usually too pessimistic, but it can be too optimistic if the source cluster was on the bring of wrap. Thanks for the report.
В списке pgsql-hackers по дате отправления: