Обсуждение: Bug (#3484) - Invalid page header again

Поиск
Список
Период
Сортировка

Bug (#3484) - Invalid page header again

От
alex
Дата:
Hi folks,

we had reported about various database problems some weeks ago.
Since then we have updated the database to release 8.2.4 und the
linux kernel to 2.6.22.6-smp. Now we got an error again:


IN SHORT:
- data is inserted
- the same data is read and exported successfully
- after a nightly vacuum analyze the data is corrupted and cannot be
read any more. Error message(s): see below

CONCLUSION
Apparently the data is corrupted by  "vacuum analyze".


IN DETAIL:

We got an error during the nightly dump again:
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  invalid memory alloc request
size 18446744073709551610

Some records in table "transaktion" (which contains about 45 million
records) are corrupted.
But the data has not been corrupted after insertion, it must have been
corrupted later.

Let us explain the track of the error:

1. 2007/12/07 ~3:30h: The (now corrupted) data was inserted successfully
2. 2007/12/07 7h : The (now corrupted) data was read and exported
successfully!
( We run an export of the data every morning at 7h, which exports the
data we retrieved/inserted during the last 24 hours )
3. 2007/12/07 22h: Database was dumped successfully
4. 2007/12/07 23:15h: Database "vacuum analyze" was run successfully
5. 2007/12/08 22h: The database dump got the error described above:
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  invalid memory alloc request
size 18446744073709551610
6. 2007/12/08 23h: "vacuum analyze" threw an error:
INFO:  vacuuming "public.transaktion"
WARNING:  relation "transaktion" TID 1240631/12: OID is invalid
ERROR:  invalid page header in block 1240632 of relation "transaktion"
7. 2007/12/10 : We started the export of the data ( which runs every
morning ) for the last days again. These exports use the same
SQL-Commands as the automatical run.
But now, we got an error when exporting the data for 2007/12/07.
ERROR: invalid memory alloc request size 18446744073709551610
The process exporting the same set of data ran successfully in the
morning of the 2007/12/07:
We are very sure, that the data has not been manipulated since the time
of insertion, because the error occurs on the testing system and at the
moment no tests except from inserting and exporting the data are done.
8. 2007/12/14
When we now start a select over the corrupted data, we get the error
message:
ERROR:  could not access status of transaction 313765632
DETAIL:  Could not open file "pg_clog/012B": Datei oder Verzeichnis
nicht gefunden.


We are using Linux version: 2.6.22.6-smp.
Hardware system: 2 dual core processor ( Intel(R) Xeon(TM) CPU 2.80GHz )
postgres-Version: 8.2.4


-------- Original-Nachricht --------
Betreff: Missing pg_clog file / corrupt index / invalid page header
Datum: Wed, 05 Sep 2007 08:18:31 +0200
Von: alex <an@clickware.de>
Organisation: click:ware GmbH
An: pgsql-bugs@postgresql.org

My colleague Marc Schablewski reported this Bug (#3484) the first time
at the end of July.
The described problem occured twice at our database and now it happened
again.

Summary
==========

Various errors like:
"invalid page header in block 8658 of relation",
"could not open segment 2 of relation 1663/77142409/266753945 (target
block 809775152)",
"ERROR:  could not access status of transaction 2134240 DETAIL:  could
not open file "pg_clog/0002": File not found",
"CEST PANIC:  right sibling's left-link doesn't match"

on the following system:
Postgres 8.1.8
SUsE Linux Kernel 2.6.13-15.8-smp
2 Intel XEON Processors with 2 cores each
ECC-Ram
Hardware Raid (mirror set)

Detailed description
=======================

The message was thrown by the nightly pg_dump:
pg_dump: ERROR:  invalid page header in block 8658 of relation
"import_data_zeilen"
pg_dump: SQL command to dump the contents of table "import_data_zeilen"
failed: PQendcopy() failed.
pg_dump: Error message from server: ERROR:  invalid page header in block
8658 of relation "import_data_zeilen"
pg_dump: The command was: COPY public.import_data_zeilen (id, eda_id,
zeile, man_id, sta_id) TO stdout;

A manually executed dedicated dump on the concerned table was processed
successfully ( at daytime! )
We were really suprised!
Also, select-queries (using indexes) on the table succeeded.
(in the past when the error occured, select-queries failed).
So, no repair seemed to be needed for the table.

The following night, the pg_dump succeeded, but the "vacuum
analyze" (executed after the pg_dump) threw the same error:

INFO:  vacuuming "public.import_data_zeilen"
ERROR:  invalid page header in block 8658 of relation "import_data_zeilen"

Any select on this table using indexes now failed!
( if the resultset contained the corrupted data )

This behaviour is very confusing.

Re-creating the table solved the problem. However, the damaged rows were
lost.

We have two systems, one active, one for tests.
They are nearly identical, having similar hardare, using the same
software and they are running under the same load.
The errors always occured on the active server, the test-server didn't
run into errors after upgrading both servers from 8.1.3 to 8.1.8.

So even though no hardware errors were detected (neither ECC-RAM-Errors
nor disk errors) we decided to swap the server's roles, to find out if
its a hardware or software problem.
This was 12 days ago.

Now we got another error, again on the active system (which now uses the
hardware from the other system except for the one of the hard disks in
the raid), which was thrown by an insert statement done by the software:
org.postgresql.util.PSQLException: ERROR: could not open segment 2 of
relation 1663/77142409/266753945 (target block 809775152): Datei oder
Verzeichnis nicht gefunden.

Obviously we have a problem with the he active server.
But its unlikely to be a hardware problem, because we changed the hard
disks and the error occured at the same (software) system.
Also we are using ECC-Ram and a raid system (mirrorset) with hardware
raid controller, which hasn't reported any errors.

We read the last post/thread concerning this bug. In this thread the
problem was connected to some kernel bug in 2.6.11.
We are using a higher Linux version: 2.6.13-15.8-smp.
Hardware system: 2 dual core processor ( Intel(R) Xeon(TM) CPU 2.80GHz )
postgres-Version: 8.1.8

We have done a lot of database maintenance 4 days ago, which among other
updates dropped about 10 indexes on one big table ( 35'000'000
recordsets ) and created some other 10 indexes (for better performance).

Given that the problem occurred on two different machines we are very
sure that it is *not* a hardware problem.

We would really appreciate any help with our problems.

Thanks in advance

     A. Nitzschke

Re: Bug (#3484) - Invalid page header again

От
Tom Lane
Дата:
alex <an@clickware.de> writes:
> Given that the problem occurred on two different machines we are very
> sure that it is *not* a hardware problem.

I wouldn't be so sure, especially not when the behavior looks like a
hardware problem in every other respect.  You've heard of common-mode
failures, no?  Do both machines have RAM chips from the same batch,
for instance?

            regards, tom lane

Re: Bug (#3484) - Invalid page header again

От
Zdenek Kotala
Дата:
alex wrote:

<snip>

> WARNING:  relation "transaktion" TID 1240631/12: OID is invalid
> ERROR:  invalid page header in block 1240632 of relation "transaktion"
> 7. 2007/12/10 : We started the export of the data ( which runs every
> morning ) for the last days again. These exports use the same
> SQL-Commands as the automatical run.

Alex,

please can you provide binary dump of these two pages or if there are
sensitive data try to use pg_filedump to get only page and tuple headers?

        Zdenek

Re: Bug (#3484) - Invalid page header again

От
Zdenek Kotala
Дата:
Zdenek Kotala wrote:
> alex wrote:
>
> <snip>
>
>> WARNING:  relation "transaktion" TID 1240631/12: OID is invalid
>> ERROR:  invalid page header in block 1240632 of relation "transaktion"
>> 7. 2007/12/10 : We started the export of the data ( which runs every
>> morning ) for the last days again. These exports use the same
>> SQL-Commands as the automatical run.
>
> Alex,
>
> please can you provide binary dump of these two pages or if there are
> sensitive data try to use pg_filedump to get only page and tuple headers?
>
>


I got dump of affected two blocks from Alex and it seems that both blocks were
overwritten together with some 128bytes length structure (there some pattern)
and complete damaged size is 9728bytes (first block is overwritten completely
and second one only at the beginning), but another buffer from another relation
could be overwritten too.

I think it is more software bug than hardware, because bad data contains some
logic. There is x54 byte which is repeated after each 128 bytes and so on and
most data are zeros.

My suggestion is apply following patch to catch if data are corrupted by
postgreSQL or elsewhere. It should be failed before writing damaged data to the
disk. It is for HEAD but similar patch could be backported.

Index: backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.227
diff -c -r1.227 bufmgr.c
*** backend/storage/buffer/bufmgr.c     15 Nov 2007 21:14:37 -0000      1.227
--- backend/storage/buffer/bufmgr.c     18 Dec 2007 15:50:06 -0000
***************
*** 1734,1739 ****
--- 1734,1741 ----
         buf->flags &= ~BM_JUST_DIRTIED;
         UnlockBufHdr(buf);

+       if (!PageHeaderIsValid((PageHeader) BufHdrGetBlock(buf)))
+               elog(FATAL, "Buffer cache is damaged!");
         smgrwrite(reln,
                           buf->tag.blockNum,
                           (char *) BufHdrGetBlock(buf),
***************
*** 1966,1971 ****
--- 1968,1976 ----
                                 errcontext.previous = error_context_stack;
                                 error_context_stack = &errcontext;

+                               if (!PageHeaderIsValid((PageHeader)
BufHdrGetBlock(bufHdr)))
+                       elog(FATAL, "Buffer cache is damaged!");
+
                                 smgrwrite(rel->rd_smgr,
                                                   bufHdr->tag.blockNum,
                                                   (char *)
LocalBufHdrGetBlock(bufHdr),
Index: backend/storage/buffer/localbuf.c
===================================================================
RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/buffer/localbuf.c,v
retrieving revision 1.78
diff -c -r1.78 localbuf.c
*** backend/storage/buffer/localbuf.c   15 Nov 2007 21:14:38 -0000      1.78
--- backend/storage/buffer/localbuf.c   18 Dec 2007 16:05:49 -0000
***************
*** 16,21 ****
--- 16,22 ----
   #include "postgres.h"

   #include "storage/buf_internals.h"
+ #include"storage/bufpage.h"
   #include "storage/bufmgr.h"
   #include "storage/smgr.h"
   #include "utils/guc.h"
***************
*** 161,166 ****
--- 162,169 ----
                 oreln = smgropen(bufHdr->tag.rnode);

                 /* And write... */
+               if (!PageHeaderIsValid((PageHeader) LocalBufHdrGetBlock(bufHdr)))
+               elog(FATAL, "Local buffer cache is damaged!");
                 smgrwrite(oreln,
                                   bufHdr->tag.blockNum,
                                   (char *) LocalBufHdrGetBlock(bufHdr),

Re: Bug (#3484) - Invalid page header again

От
Zdenek Kotala
Дата:
Zdenek Kotala wrote:
> Zdenek Kotala wrote:

> I got dump of affected two blocks from Alex and it seems that both
> blocks were overwritten together with some 128bytes length structure
> (there some pattern) and complete damaged size is 9728bytes (first block
> is overwritten completely and second one only at the beginning), but
> another buffer from another relation could be overwritten too.

I'm sorry, It is 129 length (do not trust ghex2 :( ) and probably size of
damaged data is about 9728bytes.

        Zdenek

Re: Bug (#3484) - Invalid page header again

От
Gregory Stark
Дата:
"Zdenek Kotala" <Zdenek.Kotala@Sun.COM> writes:

> I got dump of affected two blocks from Alex and it seems that both blocks were
> overwritten together with some 128bytes length structure (there some pattern)
> and complete damaged size is 9728bytes (first block is overwritten completely
> and second one only at the beginning), but another buffer from another relation
> could be overwritten too.

I don't understand this 9728 bytes. Postgres blocks are 8192 bytes. Are you
saying one whole block is trashed and then part, but not all, of the next
block?

What's the block size of the ZFS filesystem? And what exactly does the trash
data look like?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: Bug (#3484) - Invalid page header again

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> I don't understand this 9728 bytes. Postgres blocks are 8192 bytes. Are you
> saying one whole block is trashed and then part, but not all, of the next
> block?

> What's the block size of the ZFS filesystem? And what exactly does the trash
> data look like?

It seems relevant that 9728 is a multiple of 512 (and not of any larger
power of 2).  That makes it sound like the misfeasance happened at the
disk block level.  Postgres doesn't internally deal in 512-byte chunks
anywhere I can think of, so to posit a DB bug you have to come up with
some other explanation why exactly that much of the block got trashed.

A filesystem-level software bug is not out of the question, though
this still smells to me like it's a hardware issue of some kind.

            regards, tom lane

Re: Bug (#3484) - Invalid page header again

От
Zdenek Kotala
Дата:
Gregory Stark wrote:
> "Zdenek Kotala" <Zdenek.Kotala@Sun.COM> writes:
>
>> I got dump of affected two blocks from Alex and it seems that both blocks were
>> overwritten together with some 128bytes length structure (there some pattern)
>> and complete damaged size is 9728bytes (first block is overwritten completely
>> and second one only at the beginning), but another buffer from another relation
>> could be overwritten too.
>
> I don't understand this 9728 bytes. Postgres blocks are 8192 bytes. Are you
> saying one whole block is trashed and then part, but not all, of the next
> block?
>
> What's the block size of the ZFS filesystem? And what exactly does the trash
> data look like?
>

ZFS has dynamic block size (max 128kB), but this system runs on Linux probably
ext3 FS. Please, ale can you confirm used filesystem?

        Zdenek