Forward zeroing of pg_clog
От | Tom Lane |
---|---|
Тема | Forward zeroing of pg_clog |
Дата | |
Msg-id | 8958.1093889989@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Forward zeroing of pg_clog
Re: Forward zeroing of pg_clog |
Список | pgsql-hackers |
I just spent some time chasing weird failures ("PANIC: cannot abort transaction 201109, it was already committed" after some but not all errors) which I eventually realized were because pg_clog contained commit and abort flags for several thousand transactions ahead of where the current XID counter is in my test database. How did it get that way? Well, yesterday I was testing the XLOG mods to support huge COMMIT records, so I ran a test script that would commit a transaction with 20000 subcommitted subtransactions. And then I kill 9'd the backend to force WAL replay of that large transaction. WAL replay sets the XID counter as one more than the largest XID that it sees evidence of in the replayed log. However, it's not looking inside the COMMIT or ABORT records, and so in this case the largest XID it saw was that of the parent transaction. The actual pre-crash XID counter was of course 20000 more than that. This particular issue is just a simple oversight in xact_redo, and it's easily fixed: make sure nextXID gets advanced past all of the committed or aborted subXIDs too. But thinking about it, I realized that we have some other issues in the same area. Because subxact commit sets clog bits but emits no WAL record, it's at least theoretically possible that post-crash there will be written-out clog bits for XIDs ahead of every XID of which there is any record in the WAL data. RecordTransactionCommit and friends have other cases in which they think it's sufficient to write a clog entry and no WAL entry. Perhaps that's broken, but I think the cleanest fix is that the clog code ought to forcibly zero all clog entries ahead of whatever nextXID is settled on by WAL replay. Otherwise we run some risk of subtransactions that are still running looking like they are subcommitted (or worse) in the clog data. This is already true at the page level: when advancing into a new page we zero it instead of reading anything from disk. I am thinking of adding code to StartupCLOG to zero the remaining portion of the "current" page too. Thoughts? regards, tom lane
В списке pgsql-hackers по дате отправления: