Обсуждение: How to reset WAL enveironment

Поиск
Список
Период
Сортировка

How to reset WAL enveironment

От
Hiroshi Inoue
Дата:
Hi,

I see now the following message and couldn't start
postmaster.

FATAL 2:  btree_insert_redo: uninitialized page

Is it a bug ?
Anyway,how do I reset my WAL environment ?

Regards.

Hiroshi Inoue


RE: How to reset WAL enveironment

От
"Mikheev, Vadim"
Дата:
> I see now the following message and couldn't start
> postmaster.
> 
> FATAL 2:  btree_insert_redo: uninitialized page
> 
> Is it a bug ?

Seems so. btree_insert_redo shouldn't see uninitialized pages
(only newroot and split ops add pages to index and they should
be redone before insert op).
Can you post/ftp me tgz of data dir?
Or start up postmaster with --wal_debug=1 and send me
output.

> Anyway,how do I reset my WAL environment ?

Only one way - remove index file. I didn't add file node
to elog output yet (will do for beta2), but wal_debug
will show it.

Vadim


Re: How to reset WAL enveironment

От
Hiroshi Inoue
Дата:
Mikheev, Vadim wrote:
> 
> > I see now the following message and couldn't start
> > postmaster.
> >
> > FATAL 2:  btree_insert_redo: uninitialized page
> >
> > Is it a bug ?
> 
> Seems so. btree_insert_redo shouldn't see uninitialized pages
> (only newroot and split ops add pages to index and they should
> be redone before insert op).
> Can you post/ftp me tgz of data dir?
> Or start up postmaster with --wal_debug=1 and send me
> output.
>

Probably this is caused by my trial (local) change
and generated an illegal log output.
However it seems to mean that WAL isn't always
redo-able. In my case the index is probably a
system index unfortunately. Is there a way to
avoid invoking recovery process at startup ?

Regards.

Hiroshi Inoue


RE: How to reset WAL enveironment

От
"Mikheev, Vadim"
Дата:
> > > FATAL 2:  btree_insert_redo: uninitialized page
> > >
> > > Is it a bug ?
> > 
> > Seems so. btree_insert_redo shouldn't see uninitialized pages
> > (only newroot and split ops add pages to index and they should
> > be redone before insert op).
> > Can you post/ftp me tgz of data dir?
> > Or start up postmaster with --wal_debug=1 and send me
> > output.
> >
> 
> Probably this is caused by my trial (local) change
> and generated an illegal log output.
> However it seems to mean that WAL isn't always
> redo-able.

Illegal log output is like disk crash - only BAR can help.
I agree that elog(STOP) caused by problems with single
data file is quite annoying, it would be great if we could
mark table/index as corrupted after recovery, but there are
no means for this now. For the moment we can only notify
DBA about problems with file node and ignore further
recovery of corresponding table/index - I'll do this
for beta2/3 if no one else.

> In my case the index is probably a
> system index unfortunately. Is there a way to

Your REINDEX works very well.

> avoid invoking recovery process at startup ?

You would get totally corrupted database. Remember -
WAL avoids not only fsync() but write() too.

Vadim


Re: How to reset WAL enveironment

От
Hiroshi Inoue
Дата:
Mikheev, Vadim wrote:
>
> > > > FATAL 2:  btree_insert_redo: uninitialized page
> > > >
> > > > Is it a bug ?
> > >
> > > Seems so. btree_insert_redo shouldn't see uninitialized pages
> > > (only newroot and split ops add pages to index and they should
> > > be redone before insert op).
> > > Can you post/ftp me tgz of data dir?
> > > Or start up postmaster with --wal_debug=1 and send me
> > > output.
> > >
> >
> > Probably this is caused by my trial (local) change
> > and generated an illegal log output.
> > However it seems to mean that WAL isn't always
> > redo-able.
>
> Illegal log output is like disk crash - only BAR can help.


But redo-recovery after restore would also fail.
The operation which corresponds to the illegal
log output aborted at the execution time and
rolling back by redo also failed. It seems
preferable to me that the transaction is rolled
back by undo.

> I agree that elog(STOP) caused by problems with single
> data file is quite annoying, it would be great if we could
> mark table/index as corrupted after recovery, but there are
> no means for this now. For the moment we can only notify
> DBA about problems with file node and ignore further
> recovery of corresponding table/index - I'll do this
> for beta2/3 if no one else.
>
> > In my case the index is probably a
> > system index unfortunately. Is there a way to
>
> Your REINDEX works very well.
>

OK,the indexes of pg_class were recovered by REINDEX.

Regards.

Hiroshi Inoue


RE: How to reset WAL enveironment

От
"Mikheev, Vadim"
Дата:
> > > Probably this is caused by my trial (local) change
> > > and generated an illegal log output.
> > > However it seems to mean that WAL isn't always
> > > redo-able.
> > 
> > Illegal log output is like disk crash - only BAR can help.
> 
> But redo-recovery after restore would also fail.
> The operation which corresponds to the illegal
> log output aborted at the execution time and 
> rolling back by redo also failed. It seems
> preferable to me that the transaction is rolled
> back by undo.  

What exactly did you change in code?
What kind of illegal log output?
Was something breaking btree/WAL logic written to log?

Vadim


Re: How to reset WAL enveironment

От
Hiroshi Inoue
Дата:
"Mikheev, Vadim" wrote:
> 
> > > > Probably this is caused by my trial (local) change
> > > > and generated an illegal log output.
> > > > However it seems to mean that WAL isn't always
> > > > redo-able.
> > >
> > > Illegal log output is like disk crash - only BAR can help.
> >
> > But redo-recovery after restore would also fail.
> > The operation which corresponds to the illegal
> > log output aborted at the execution time and
> > rolling back by redo also failed. It seems
> > preferable to me that the transaction is rolled
> > back by undo.
> 
> What exactly did you change in code?

I'm changing REINDEX under postmaster to be safe under WAL.
(When I met Tatsuo last week he asked me if REINDEX under
postmaster is possible and I replied yes. However I'm
not sure REINDEX under postmaster is sufficiently safe
especially under WAL and I started to change REINDEX to
be rollbackable using relfilenode.)

> What kind of illegal log output?

Probably a new block was about to be inserted into new
relfilenode suddenly. 
I've been anxious about rolling back by redo.
There's no guarantee that retrying redo-log would never
fail.

I see a vacuum failure now.
I probably fixed a bug(see pgsql-committers) but
there seems to remain other bugs.

Regards.

Hirohi Inoue