[RFC] LSN Map
От | Marco Nenciarini |
---|---|
Тема | [RFC] LSN Map |
Дата | |
Msg-id | 54AD016E.9020406@2ndquadrant.it обсуждение исходный текст |
Ответы |
Re: [RFC] LSN Map
Re: [RFC] LSN Map |
Список | pgsql-hackers |
Hi Hackers, In order to make incremental backup (https://wiki.postgresql.org/wiki/Incremental_backup) efficient we need a way to track the LSN of a page in a way that we can retrieve it without reading the actual block. Below there is my proposal on how to achieve it. LSN Map ------- The purpose of the LSN map is to quickly know if a page of a relation has been modified after a specified checkpoint. Implementation -------------- We create an additional fork which contains a raw stream of LSNs. To limit the space used, every entry represent the maximum LSN of a group of blocks of a fixed size. I chose arbitrarily the size of 2048 which is equivalent to 16MB of heap data, which means that we need 64k entry to track one terabyte of heap. Name ---- I've called this map LSN map, and I've named the corresponding fork file as "lm". WAL logging ----------- At the moment the map is not wal logged, but is updated during the wal reply. I'm not enough deep in WAL mechanics to see if the current approach is sane or if we should change it. Current limits -------------- The current implementation tracks only heap LSN. It currently does not track any kind of indexes, but this can be easily added later. The implementation of commands that rewrite the whole table can be improved: cluster uses shared memory buffers instead of writing the map directly on the disk, and moving a table to another tablespace simply drops the map instead of updating it correctly. Further ideas ------------- The current implementation updates an entry in the map every time the block get its LSN bumped, but we really only need to know which is the first checkpoint that contains expired data. So setting the entry to the last checkpoint LSN is probably enough, and will reduce the number of writes. To implement this we only need a backend local copy of the last checkpoint LSN, which is updated during each XLogInsert. Again, I'm not enough deep in replication mechanics to see if this approach could work on a standby using restartpoints instead of checkpoints. Please advice on the best way to implement it. Conclusions ------------ This code is incomplete, and the xlog reply part must be improved/fixed, but I think its a good start to have this feature. I will appreciate any review, advice or critic. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Вложения
В списке pgsql-hackers по дате отправления: