Re: Protecting against unexpected zero-pages: proposal
От | Robert Haas |
---|---|
Тема | Re: Protecting against unexpected zero-pages: proposal |
Дата | |
Msg-id | AANLkTintPQKjFBEMGc3Ww_db7wmTt8FeF=VqLgeQ69AG@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Protecting against unexpected zero-pages: proposal (Greg Stark <gsstark@mit.edu>) |
Список | pgsql-hackers |
On Tue, Nov 9, 2010 at 3:05 PM, Greg Stark <gsstark@mit.edu> wrote: > On Tue, Nov 9, 2010 at 7:37 PM, Josh Berkus <josh@agliodbs.com> wrote: >> Well, most of the other MVCC-in-table DBMSes simply don't deal with >> large, on-disk databases. In fact, I can't think of one which does, >> currently; while MVCC has been popular for the New Databases, they're >> all focused on "in-memory" databases. Oracle and InnoDB use rollback >> segments. > > Well rollback segments are still MVCC. However Oracle's MVCC is > block-based. So they only have to do the visibility check once per > block, not once per row. Once they find the right block version they > can process all the rows on it. > > Also Oracle's snapshots are just the log position. Instead of having > to check whether every transaction committed or not, they just find > the block version which was last modified before the log position for > when their transaction started. That is cool. One problem is that it might sometimes result in additional I/O. A transaction begins and writes a tuple. We must write a preimage of the page (or at least, sufficient information to reconstruct a preimage of the page) to the undo segment. If the transaction commits relatively quickly, and all transactions which took their snapshots before the commit end either by committing or by aborting, we can discard that information from the undo segment without ever writing it to disk. However, if that doesn't happen, the undo log page may get evicted, and we're now doing three writes (WAL, page, undo) rather than just two (WAL, page). That's no worse than an update where the old and new tuples land on different pages, but it IS worse than an update where the old and new tuples are on the same page, or at least I think it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: