Обсуждение: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

Поиск
Список
Период
Сортировка

pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

От
Achilleas Mantzios
Дата:
Hello,
i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem)
I think that this server was under some sort of constant resets or hardware failures.
Initially,i had this problem:
ERROR:  invalid page header in block 672720 of relation "pg_toast_125716009"

This toast table corresponds to a table named "mail_message",
                                Table "public.mail_message"
  Column   |       Type        |                         Modifiers
-----------+-------------------+-----------------------------------------------------------
 msgno     | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass)
 msgsource | bytea             |
Indexes:
    "mail_message_key" PRIMARY KEY, btree (msgno)

(obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance.
I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this
suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html

i found the oid of the table:
SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1
 tableoid  | ?column?
-----------+----------
 125716013 |        1

(and just to verify)
SELECT relname from pg_class where oid=125716013;
      relname
--------------------
 pg_toast_125716009

Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1

However, after that, unfortunately i get constant postgresql server restarts with:
FATAL:  segment too big
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Is there anything i can do to savage the situation?

(one of) the hard part here is that i dont have neither physical nor network access to the server
(only ultra expensive unreliable satellite comms)

Thanks for any hints...

--
Achilleas Mantzios

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

От
Cédric Villemain
Дата:
2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>:
> Hello,
> i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem)
> I think that this server was under some sort of constant resets or hardware failures.
> Initially,i had this problem:
> ERROR:  invalid page header in block 672720 of relation "pg_toast_125716009"
>
> This toast table corresponds to a table named "mail_message",
>                                Table "public.mail_message"
>  Column   |       Type        |                         Modifiers
> -----------+-------------------+-----------------------------------------------------------
>  msgno     | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass)
>  msgsource | bytea             |
> Indexes:
>    "mail_message_key" PRIMARY KEY, btree (msgno)
>
> (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance.
> I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this
> suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html
>
> i found the oid of the table:
> SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1
>  tableoid  | ?column?
> -----------+----------
>  125716013 |        1
>
> (and just to verify)
> SELECT relname from pg_class where oid=125716013;
>      relname
> --------------------
>  pg_toast_125716009
>
> Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
> dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1

segment have 1.1GB size maximum. You have to catch in what segment the
faulty block is, and reajust the block value from the error report to
the real one in the good segment.

>
> However, after that, unfortunately i get constant postgresql server restarts with:
> FATAL:  segment too big
> server closed the connection unexpectedly
>        This probably means the server terminated abnormally
>        before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
>
> Is there anything i can do to savage the situation?
>
> (one of) the hard part here is that i dont have neither physical nor network access to the server
> (only ultra expensive unreliable satellite comms)
>
> Thanks for any hints...
>
> --
> Achilleas Mantzios
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>



--
Cédric Villemain

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

От
Tom Lane
Дата:
Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
> dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1

> However, after that, unfortunately i get constant postgresql server restarts with:
> FATAL:  segment too big

You got the dd command wrong and made the file size change to something
it shouldn't be.  I think you can use dd to truncate the file back to
what it should be (ie, exactly 1GB) but haven't had enough caffeine to
remember exactly how.

I think the underlying error is that block 672720 isn't going to be in
the first segment of the table --- you need to be zeroing something in
one of the other segments...

            regards, tom lane

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

От
Achilleas Mantzios
Дата:
Στις Thursday 22 April 2010 16:53:05 ο/η Cédric Villemain έγραψε:
> 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>:
> > Hello,
> > i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem)
> > I think that this server was under some sort of constant resets or hardware failures.
> > Initially,i had this problem:
> > ERROR:  invalid page header in block 672720 of relation "pg_toast_125716009"
> >
> > This toast table corresponds to a table named "mail_message",
> >                                Table "public.mail_message"
> >  Column   |       Type        |                         Modifiers
> > -----------+-------------------+-----------------------------------------------------------
> >  msgno     | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass)
> >  msgsource | bytea             |
> > Indexes:
> >    "mail_message_key" PRIMARY KEY, btree (msgno)
> >
> > (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance.
> > I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this
> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html
> >
> > i found the oid of the table:
> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1
> >  tableoid  | ?column?
> > -----------+----------
> >  125716013 |        1
> >
> > (and just to verify)
> > SELECT relname from pg_class where oid=125716013;
> >      relname
> > --------------------
> >  pg_toast_125716009
> >
> > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
> > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1
>
> segment have 1.1GB size maximum. You have to catch in what segment the
> faulty block is, and reajust the block value from the error report to
> the real one in the good segment.
>

Thanx,
Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr/md.c
i see the the error comes from function mdnblocks

if (nblocks > ((BlockNumber) RELSEG_SIZE))
                        elog(FATAL, "segment too big");

That means, that some segment file is bigger than RELSEG_SIZE
At least in my system:
#define BLCKSZ   8192
#define RELSEG_SIZE (0x40000000 / BLCKSZ)
So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 bytes = 1GB)

Currently i dont have any access to the machine but tomorrow i will check the file sizes.

Can anyone shed some light as to some method of identifying all the segment files of a table?
The first one has the same name as the tableoid.
How about the subsequent segments?

> >
> > However, after that, unfortunately i get constant postgresql server restarts with:
> > FATAL:  segment too big
> > server closed the connection unexpectedly
> >        This probably means the server terminated abnormally
> >        before or while processing the request.
> > The connection to the server was lost. Attempting reset: Succeeded.
> >
> > Is there anything i can do to savage the situation?
> >
> > (one of) the hard part here is that i dont have neither physical nor network access to the server
> > (only ultra expensive unreliable satellite comms)
> >
> > Thanks for any hints...
> >
> > --
> > Achilleas Mantzios
> >
> > --
> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-admin
> >
>
>
>



--
Achilleas Mantzios

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

От
Achilleas Mantzios
Дата:
Στις Thursday 22 April 2010 17:42:33 γράψατε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
> > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1
>
> > However, after that, unfortunately i get constant postgresql server restarts with:
> > FATAL:  segment too big
>
> You got the dd command wrong and made the file size change to something
> it shouldn't be.  I think you can use dd to truncate the file back to
> what it should be (ie, exactly 1GB) but haven't had enough caffeine to
> remember exactly how.
>
> I think the underlying error is that block 672720 isn't going to be in
> the first segment of the table --- you need to be zeroing something in
> one of the other segments...

Great, thanx a lot,
any idea about the naming of the files of those other segments?

>
>             regards, tom lane
>



--
Achilleas Mantzios

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

От
Cédric Villemain
Дата:
2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>:
> Στις Thursday 22 April 2010 16:53:05 ο/η Cédric Villemain έγραψε:
>> 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>:
>> > Hello,
>> > i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem)
>> > I think that this server was under some sort of constant resets or hardware failures.
>> > Initially,i had this problem:
>> > ERROR:  invalid page header in block 672720 of relation "pg_toast_125716009"
>> >
>> > This toast table corresponds to a table named "mail_message",
>> >                                Table "public.mail_message"
>> >  Column   |       Type        |                         Modifiers
>> > -----------+-------------------+-----------------------------------------------------------
>> >  msgno     | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass)
>> >  msgsource | bytea             |
>> > Indexes:
>> >    "mail_message_key" PRIMARY KEY, btree (msgno)
>> >
>> > (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance.
>> > I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this
>> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html
>> >
>> > i found the oid of the table:
>> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1
>> >  tableoid  | ?column?
>> > -----------+----------
>> >  125716013 |        1
>> >
>> > (and just to verify)
>> > SELECT relname from pg_class where oid=125716013;
>> >      relname
>> > --------------------
>> >  pg_toast_125716009
>> >
>> > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
>> > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1
>>
>> segment have 1.1GB size maximum. You have to catch in what segment the
>> faulty block is, and reajust the block value from the error report to
>> the real one in the good segment.
>>
>
> Thanx,
> Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr/md.c
> i see the the error comes from function mdnblocks
>
> if (nblocks > ((BlockNumber) RELSEG_SIZE))
>                        elog(FATAL, "segment too big");
>
> That means, that some segment file is bigger than RELSEG_SIZE
> At least in my system:
> #define BLCKSZ   8192
> #define RELSEG_SIZE (0x40000000 / BLCKSZ)
> So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 bytes = 1GB)
>
> Currently i dont have any access to the machine but tomorrow i will check the file sizes.
>
> Can anyone shed some light as to some method of identifying all the segment files of a table?
> The first one has the same name as the tableoid.
> How about the subsequent segments?

Your execution of dd make your first segment bigger than expected.
Other segment have the same name with a .1 .2 etc  suffix.

You have to shrink your first segment to the correct size.
check what happens, you should have now the original error.

And, I have never used it, but I think it is the purpose of
zero_damaged_pages to parameter to allow postgresql itself to zero the
bad black. (reading
src/backend/storage/buffer/bufmgr.c confirm that.

*BUT* take care that it will zero *every* bad page, perhaps not only
the one trapping an error.

In those situation, it is good to make a snapshot of the pgdata
directory, in case your fingers surf  too fast on the keyboard....

If you don't want to activate zero_damage_page,  then go and calculate
which block in which segment you have to zeroing.

side note, it may be usefull to have the relevant information in the
error message...

>
>> >
>> > However, after that, unfortunately i get constant postgresql server restarts with:
>> > FATAL:  segment too big
>> > server closed the connection unexpectedly
>> >        This probably means the server terminated abnormally
>> >        before or while processing the request.
>> > The connection to the server was lost. Attempting reset: Succeeded.
>> >
>> > Is there anything i can do to savage the situation?
>> >
>> > (one of) the hard part here is that i dont have neither physical nor network access to the server
>> > (only ultra expensive unreliable satellite comms)
>> >
>> > Thanks for any hints...
>> >
>> > --
>> > Achilleas Mantzios
>> >
>> > --
>> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-admin
>> >
>>
>>
>>
>
>
>
> --
> Achilleas Mantzios
>



--
Cédric Villemain

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

От
Achilleas Mantzios
Дата:
Στις Thursday 22 April 2010 19:02:00 ο/η Cédric Villemain έγραψε:
> 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>:
> > Στις Thursday 22 April 2010 16:53:05 ο/η Cédric Villemain έγραψε:
> >> 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>:
> >> > Hello,
> >> > i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem)
> >> > I think that this server was under some sort of constant resets or hardware failures.
> >> > Initially,i had this problem:
> >> > ERROR:  invalid page header in block 672720 of relation "pg_toast_125716009"
> >> >
> >> > This toast table corresponds to a table named "mail_message",
> >> >                                Table "public.mail_message"
> >> >  Column   |       Type        |                         Modifiers
> >> > -----------+-------------------+-----------------------------------------------------------
> >> >  msgno     | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass)
> >> >  msgsource | bytea             |
> >> > Indexes:
> >> >    "mail_message_key" PRIMARY KEY, btree (msgno)
> >> >
> >> > (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance.
> >> > I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this
> >> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html
> >> >
> >> > i found the oid of the table:
> >> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1
> >> >  tableoid  | ?column?
> >> > -----------+----------
> >> >  125716013 |        1
> >> >
> >> > (and just to verify)
> >> > SELECT relname from pg_class where oid=125716013;
> >> >      relname
> >> > --------------------
> >> >  pg_toast_125716009
> >> >
> >> > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
> >> > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1
> >>
> >> segment have 1.1GB size maximum. You have to catch in what segment the
> >> faulty block is, and reajust the block value from the error report to
> >> the real one in the good segment.
> >>
> >
> > Thanx,
> > Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr/md.c
> > i see the the error comes from function mdnblocks
> >
> > if (nblocks > ((BlockNumber) RELSEG_SIZE))
> >                        elog(FATAL, "segment too big");
> >
> > That means, that some segment file is bigger than RELSEG_SIZE
> > At least in my system:
> > #define BLCKSZ   8192
> > #define RELSEG_SIZE (0x40000000 / BLCKSZ)
> > So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 bytes = 1GB)
> >
> > Currently i dont have any access to the machine but tomorrow i will check the file sizes.
> >
> > Can anyone shed some light as to some method of identifying all the segment files of a table?
> > The first one has the same name as the tableoid.
> > How about the subsequent segments?
>
> Your execution of dd make your first segment bigger than expected.
> Other segment have the same name with a .1 .2 etc  suffix.
>
> You have to shrink your first segment to the correct size.
> check what happens, you should have now the original error.
>
> And, I have never used it, but I think it is the purpose of
> zero_damaged_pages to parameter to allow postgresql itself to zero the
> bad black. (reading
> src/backend/storage/buffer/bufmgr.c confirm that.
>
> *BUT* take care that it will zero *every* bad page, perhaps not only
> the one trapping an error.
>
> In those situation, it is good to make a snapshot of the pgdata
> directory, in case your fingers surf  too fast on the keyboard....
>
> If you don't want to activate zero_damage_page,  then go and calculate
> which block in which segment you have to zeroing.
>
> side note, it may be usefull to have the relevant information in the
> error message...
>


Many Thanks, Cédric Villemain and Tom
What i did was first to correct the first segment file with smth like
dd if=216293737 of=216293737.good seek=0 bs=8192 count=131072
which effectively truncates all but the first 131072 blocks (or 2^30 bytes = 1GB)

After that was done, and restarting postgresql backend, then i fell back to the situation
with the Invalid page header, as noted before.

I stopped the backend and calculated the exact segment file and offset where the problem was:
The block with the invalid header was the block with number: 672720
Now each segment contains at most 131072 blocks, with all but the last
containing exactly 131072 blocks.
So my problematic segment was the one with number:
672720 /131072  = 5
and the block offset inside this segment was:
672720 - (5*131072) = 17360

so i tried to zero that patricular block with

dd conv=notrunc if=216293737.5 of=216293737.5.GOOD seek=17360 bs=8192 count=1

i started postgresql and i threw out a warning about initializing this zero page.
After that, i reported error in header for block 672740,
i repeated the same procedure, and i was able to go further and even reindex the whole database
which went fine.

I tried as the ultimate test (and a useful one at this point) to pg_dump the database.
Unfortunately one table seems to be in error: the error is:

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  compressed data is corrupt
pg_dump: The command was: COPY public.mail_entity (msgno, entno, entparentno....

I think the initial issue of this thread is solved, i'll come back with news on the pg_dump issue.

> >
> >> >
> >> > However, after that, unfortunately i get constant postgresql server restarts with:
> >> > FATAL:  segment too big
> >> > server closed the connection unexpectedly
> >> >        This probably means the server terminated abnormally
> >> >        before or while processing the request.
> >> > The connection to the server was lost. Attempting reset: Succeeded.
> >> >
> >> > Is there anything i can do to savage the situation?
> >> >
> >> > (one of) the hard part here is that i dont have neither physical nor network access to the server
> >> > (only ultra expensive unreliable satellite comms)
> >> >
> >> > Thanks for any hints...
> >> >
> >> > --
> >> > Achilleas Mantzios
> >> >
> >> > --
> >> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> >> > To make changes to your subscription:
> >> > http://www.postgresql.org/mailpref/pgsql-admin
> >> >
> >>
> >>
> >>
> >
> >
> >
> > --
> > Achilleas Mantzios
> >
>
>
>
> --
> Cédric Villemain
>



--
Achilleas Mantzios