Re: Spreading full-page writes
От | Heikki Linnakangas |
---|---|
Тема | Re: Spreading full-page writes |
Дата | |
Msg-id | 538455EB.6040006@vmware.com обсуждение исходный текст |
Ответ на | Re: Spreading full-page writes (Greg Stark <stark@mit.edu>) |
Ответы |
Re: Spreading full-page writes
|
Список | pgsql-hackers |
On 05/26/2014 02:26 PM, Greg Stark wrote: > On Mon, May 26, 2014 at 1:22 PM, Heikki Linnakangas <hlinnakangas@vmware.com >> wrote: > >> The second record is generated before the checkpoint is finished and the >> checkpoint record is written. So it will be there. >> >> (if you crash before the checkpoint is finished, the in-progress >> checkpoint is no good for recovery anyway, and won't be used) > > Another idea would be to have separate checkpoints for each buffer > partition. You would have to start recovery from the oldest checkpoint of > any of the partitions. Yeah. Simon suggested that when we talked about this, but I didn't understand how that works at the time. I think I do now. The key to making it work is distinguishing, when starting recovery from the latest checkpoint, whether a record for a given page can be replayed safely. I used flags on WAL records in my proposal to achieve this, but using buffer partitions is simpler. For simplicity, let's imagine that we have two Redo-pointers for each checkpoint record: one for even-numbered pages, and another for odd-numbered pages. When checkpoint begins, we first update the Even-redo pointer to the current WAL insert location, and then flush all the even-numbered buffers in the buffer cache. Then we do the same for Odd. Recovery begins at the Even-redo pointer. Replay works as normal, but until you reach the Odd-pointer, you refrain from replaying any changes to Odd-numbered pages. After reaching the odd-pointer, you replay everything as normal. Hmm, that seems actually doable... - Heikki
В списке pgsql-hackers по дате отправления: