Обсуждение: AW: AW: AW: AW: AW: WAL-based allocation of XIDs is ins ecur e

Поиск
Список
Период
Сортировка

AW: AW: AW: AW: AW: WAL-based allocation of XIDs is ins ecur e

От
Zeugswetter Andreas SB
Дата:
> > I do not however see how the current solution fixes the original problem,
> > that we don't have a rollback for index modifications.
> > The index would potentially point to an empty heaptuple slot.
> 
> How?  There will be an XLOG entry inserting the heap tuple before the
> XLOG entry that updates the index.  Rollforward will redo both.  The
> heap tuple might not get committed, but it'll be there.

Before commit or rollback the xlog is not flushed to disk, thus you can loose
those xlog entries, but the index page might already be on disk because of
LRU buffer reuse, no ?
Another example would be a btree reorg, like adding a level, that is partway 
through before a crash.

> > Additionally I do not see how this all works for userland index types.
> 
> None of it works for index types that don't do XLOG entries (which I
> think may currently be true for everything except btree :-( ...).  I
> don't see how that changes if we alter the way this bit is done.

I really think that xlog entries should be done by a layer below the userland
functions. I would not like to risc WAL integrity by allowing userland to 
write a messed up log record. The record would be something like:
called userland index insert for "key" and "ctid". With that info you can 
easily redo, but undo would probably be hard. Thus the physical log.
Actually I am not sure index changes need to be (or are currently) logged at all.
You can deduce all necessary info from the heap xlog record 
(plus maybe the original record from disk).

Andreas


Re: AW: AW: AW: AW: WAL-based allocation of XIDs is insecur e

От
"Vadim Mikheev"
Дата:
> Before commit or rollback the xlog is not flushed to disk, thus you can loose
> those xlog entries, but the index page might already be on disk because of
> LRU buffer reuse, no ?

No. Buffer page is written to disk *only after corresponding records are flushed
to log* (WAL means Write-Ahead-Log - write log before modifying data pages).

> Another example would be a btree reorg, like adding a level, that is partway 
> through before a crash.

And this is what I hopefully fixed recently with btree runtime recovery.

Vadim




Re: AW: AW: AW: AW: AW: WAL-based allocation of XIDs is ins ecur e

От
Tom Lane
Дата:
Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at> writes:
> I really think that xlog entries should be done by a layer below the
> userland functions.

That seems somewhere between impractical and impossible: how will you
tie the functional xlog entries ("insert foo into index bar") to the
resulting page modifications, unless the entries are made from code
that knows all about which pages contain what index entries?  Don't
forget these things need to go into the xlog atomically.

> I would not like to risc WAL integrity by allowing
> userland to write a messed up log record.

Index access method code is just as critical a part of the system as
anything else.  The above makes no more sense than saying that you don't
want to trust heapam.c to generate correct WAL records.

> Actually I am not sure index changes need to be (or are currently)
> logged at all.  You can deduce all necessary info from the heap xlog
> record (plus maybe the original record from disk).

This assumes that pg_index, pg_am and friends are (a) not corrupt; (b)
in the same state that they were in when the portion of the XLOG being
replayed was made.  Neither of these assumptions is acceptable for WAL
recovery.

I do think there's something to your notion that XLOG should be logging
the pre-modification pages rather than post-modification, but that's
something we will have to come back to in 7.2 or later.  For 7.1's
purposes there is nothing wrong with the current scheme, and I have no
desire to postpone release another few months to change it.
        regards, tom lane