WAL and indexes (Re: WAL status & todo)
От | Tom Lane |
---|---|
Тема | WAL and indexes (Re: WAL status & todo) |
Дата | |
Msg-id | 25928.971709337@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | WAL status & todo ("Vadim Mikheev" <vmikheev@sectorbase.com>) |
Список | pgsql-hackers |
"Vadim Mikheev" <vmikheev@sectorbase.com> writes: > 3. There are no redo/undo for HASH, RTREE & GIST yet. This would be *really > really > great* if someone could implement it using BTREE' redo/undo code as > prototype. > These are the most complex parts of this todo. I don't understand why WAL needs to log internal operations of any of the index types. Seems to me that you could treat indexes as black boxes that are updated as side effects of WAL log items for heap tuples: when adding a heap tuple as a result of a WAL item, you just call the usual index insert routines, and when deleting a heap tuple as a result of undoing a WAL item, you mark the tuple invalid but don't physically remove it till VACUUM (thus no need to worry about its index entries). This doesn't address the issue of recovering from an incomplete index update (such as a partially-completed btree page split), but I think the most reliable way to do that is to add WAL records on the order of "update beginning for index X" and "update done for index X". If you see the begin and not the done record when replaying a log, you assume the index is corrupt and rebuild it from scratch, using Hiroshi's index-rebuild code. The reason I think this is a better way is that I don't believe any of us (unless maybe Vadim) understand rtree, hash, or especially GIST indexes well enough to implement a correct WAL logging scheme for them. Certainly just "use the btree code as a prototype" will not yield a crash-robust WAL method for the other index types, because they will have different requirements about what combinations of changes have to happen together to get from one consistent state to the next. For that matter I am far from convinced that the currently committed code for btree WAL logging is correct --- where does it cope with cleaning up after an unfinished page split? I don't see it. Since we have very poor testing capabilities for the non-mainstream index types (remember how I broke rtree completely during 6.5 devel, and no one noticed till quite late in beta?) I will have absolutely zero confidence in WAL support for these index types if it's implemented this way. I think we should go with a black-box approach that's the same for all index types and is implemented completely outside the index-access-method-specific code. regards, tom lane
В списке pgsql-hackers по дате отправления: