Обсуждение: WAL status & todo

Поиск
Список
Период
Сортировка

WAL status & todo

От
"Vadim Mikheev"
Дата:
Well, hopefully WAL will be ready for alpha testing in a few days.
Unfortunately
at the moment I have to step side from main stream to implement new file
naming,
the biggest todo for integration WAL into system.

I would really appreciate any help in the following issues (testing can
start regardless
of their statuses but they must be resolved anyway):

1. BTREE: sometimes WAL can't guarantee right order of items on leaf pages   after recovery - new flag BTP_REORDER
introducedto mark such pages.   Btree should be changed to handle this case in normal processing mode.
 
2. HEAP: like 1., this issue is result of attempt to go without compensation
records   (ie without logging undo operations): it's possible that sometimes in
redo   there will be no space for new records because of in recovery we don't   undo changes for aborted xactions
immediately- function like BTREE'
 
_bt_cleanup_page_   required for HEAP as well as general inspection of all places where
HEAP' redo ops   try to insert records (initially I thought that in recovery we'll undo
changes immediately   after reading abort record from log - this wouldn't work for BTREE:
splits must be   redo-ne before undo).
3. There are no redo/undo for HASH, RTREE & GIST yet. This would be *really
really   great* if someone could implement it using BTREE' redo/undo code as
prototype.   These are the most complex parts of this todo.

Probably, something else will follow later.

Regards,
Vadim




Re: WAL status & todo

От
"Martin A. Marques"
Дата:
On Sat, 14 Oct 2000, Vadim Mikheev wrote:
> Well, hopefully WAL will be ready for alpha testing in a few days.
> Unfortunately
> at the moment I have to step side from main stream to implement new file
> naming,
> the biggest todo for integration WAL into system.
>
> I would really appreciate any help in the following issues (testing can
> start regardless
> of their statuses but they must be resolved anyway):

I have downloaded the source via CVSup. Where can I find the WAL and the 
TOAST code?

Thanks!!

-- 
"And I'm happy, because you make me feel good, about me." - Melvin Udall
-----------------------------------------------------------------
Mart�n Marqu�s            email:     martin@math.unl.edu.ar
Santa Fe - Argentina        http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------


WAL and indexes (Re: WAL status & todo)

От
Tom Lane
Дата:
"Vadim Mikheev" <vmikheev@sectorbase.com> writes:
> 3. There are no redo/undo for HASH, RTREE & GIST yet. This would be *really
> really
>     great* if someone could implement it using BTREE' redo/undo code as
> prototype.
>     These are the most complex parts of this todo.

I don't understand why WAL needs to log internal operations of any of
the index types.  Seems to me that you could treat indexes as black
boxes that are updated as side effects of WAL log items for heap tuples:
when adding a heap tuple as a result of a WAL item, you just call the
usual index insert routines, and when deleting a heap tuple as a result
of undoing a WAL item, you mark the tuple invalid but don't physically
remove it till VACUUM (thus no need to worry about its index entries).

This doesn't address the issue of recovering from an incomplete index
update (such as a partially-completed btree page split), but I think
the most reliable way to do that is to add WAL records on the order of
"update beginning for index X" and "update done for index X".  If you
see the begin and not the done record when replaying a log, you assume
the index is corrupt and rebuild it from scratch, using Hiroshi's
index-rebuild code.

The reason I think this is a better way is that I don't believe any of
us (unless maybe Vadim) understand rtree, hash, or especially GIST
indexes well enough to implement a correct WAL logging scheme for them.
Certainly just "use the btree code as a prototype" will not yield a
crash-robust WAL method for the other index types, because they will
have different requirements about what combinations of changes have to
happen together to get from one consistent state to the next.

For that matter I am far from convinced that the currently committed
code for btree WAL logging is correct --- where does it cope with
cleaning up after an unfinished page split?  I don't see it.

Since we have very poor testing capabilities for the non-mainstream
index types (remember how I broke rtree completely during 6.5 devel,
and no one noticed till quite late in beta?) I will have absolutely
zero confidence in WAL support for these index types if it's implemented
this way.  I think we should go with a black-box approach that's the
same for all index types and is implemented completely outside the
index-access-method-specific code.
        regards, tom lane