Обсуждение: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"
Hello, i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem) I think that this server was under some sort of constant resets or hardware failures. Initially,i had this problem: ERROR: invalid page header in block 672720 of relation "pg_toast_125716009" This toast table corresponds to a table named "mail_message", Table "public.mail_message" Column | Type | Modifiers -----------+-------------------+----------------------------------------------------------- msgno | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass) msgsource | bytea | Indexes: "mail_message_key" PRIMARY KEY, btree (msgno) (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance. I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html i found the oid of the table: SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1 tableoid | ?column? -----------+---------- 125716013 | 1 (and just to verify) SELECT relname from pg_class where oid=125716013; relname -------------------- pg_toast_125716009 Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital) dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1 However, after that, unfortunately i get constant postgresql server restarts with: FATAL: segment too big server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. Is there anything i can do to savage the situation? (one of) the hard part here is that i dont have neither physical nor network access to the server (only ultra expensive unreliable satellite comms) Thanks for any hints... -- Achilleas Mantzios
2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>: > Hello, > i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem) > I think that this server was under some sort of constant resets or hardware failures. > Initially,i had this problem: > ERROR: invalid page header in block 672720 of relation "pg_toast_125716009" > > This toast table corresponds to a table named "mail_message", > Table "public.mail_message" > Column | Type | Modifiers > -----------+-------------------+----------------------------------------------------------- > msgno | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass) > msgsource | bytea | > Indexes: > "mail_message_key" PRIMARY KEY, btree (msgno) > > (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance. > I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html > > i found the oid of the table: > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1 > tableoid | ?column? > -----------+---------- > 125716013 | 1 > > (and just to verify) > SELECT relname from pg_class where oid=125716013; > relname > -------------------- > pg_toast_125716009 > > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital) > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1 segment have 1.1GB size maximum. You have to catch in what segment the faulty block is, and reajust the block value from the error report to the real one in the good segment. > > However, after that, unfortunately i get constant postgresql server restarts with: > FATAL: segment too big > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Succeeded. > > Is there anything i can do to savage the situation? > > (one of) the hard part here is that i dont have neither physical nor network access to the server > (only ultra expensive unreliable satellite comms) > > Thanks for any hints... > > -- > Achilleas Mantzios > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin > -- Cédric Villemain
Achilleas Mantzios <achill@matrix.gatewaynet.com> writes: > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital) > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1 > However, after that, unfortunately i get constant postgresql server restarts with: > FATAL: segment too big You got the dd command wrong and made the file size change to something it shouldn't be. I think you can use dd to truncate the file back to what it should be (ie, exactly 1GB) but haven't had enough caffeine to remember exactly how. I think the underlying error is that block 672720 isn't going to be in the first segment of the table --- you need to be zeroing something in one of the other segments... regards, tom lane
Στις Thursday 22 April 2010 16:53:05 ο/η Cédric Villemain έγραψε: > 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>: > > Hello, > > i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem) > > I think that this server was under some sort of constant resets or hardware failures. > > Initially,i had this problem: > > ERROR: invalid page header in block 672720 of relation "pg_toast_125716009" > > > > This toast table corresponds to a table named "mail_message", > > Table "public.mail_message" > > Column | Type | Modifiers > > -----------+-------------------+----------------------------------------------------------- > > msgno | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass) > > msgsource | bytea | > > Indexes: > > "mail_message_key" PRIMARY KEY, btree (msgno) > > > > (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance. > > I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this > > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html > > > > i found the oid of the table: > > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1 > > tableoid | ?column? > > -----------+---------- > > 125716013 | 1 > > > > (and just to verify) > > SELECT relname from pg_class where oid=125716013; > > relname > > -------------------- > > pg_toast_125716009 > > > > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital) > > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1 > > segment have 1.1GB size maximum. You have to catch in what segment the > faulty block is, and reajust the block value from the error report to > the real one in the good segment. > Thanx, Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr/md.c i see the the error comes from function mdnblocks if (nblocks > ((BlockNumber) RELSEG_SIZE)) elog(FATAL, "segment too big"); That means, that some segment file is bigger than RELSEG_SIZE At least in my system: #define BLCKSZ 8192 #define RELSEG_SIZE (0x40000000 / BLCKSZ) So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 bytes = 1GB) Currently i dont have any access to the machine but tomorrow i will check the file sizes. Can anyone shed some light as to some method of identifying all the segment files of a table? The first one has the same name as the tableoid. How about the subsequent segments? > > > > However, after that, unfortunately i get constant postgresql server restarts with: > > FATAL: segment too big > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > The connection to the server was lost. Attempting reset: Succeeded. > > > > Is there anything i can do to savage the situation? > > > > (one of) the hard part here is that i dont have neither physical nor network access to the server > > (only ultra expensive unreliable satellite comms) > > > > Thanks for any hints... > > > > -- > > Achilleas Mantzios > > > > -- > > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-admin > > > > > -- Achilleas Mantzios
Στις Thursday 22 April 2010 17:42:33 γράψατε: > Achilleas Mantzios <achill@matrix.gatewaynet.com> writes: > > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital) > > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1 > > > However, after that, unfortunately i get constant postgresql server restarts with: > > FATAL: segment too big > > You got the dd command wrong and made the file size change to something > it shouldn't be. I think you can use dd to truncate the file back to > what it should be (ie, exactly 1GB) but haven't had enough caffeine to > remember exactly how. > > I think the underlying error is that block 672720 isn't going to be in > the first segment of the table --- you need to be zeroing something in > one of the other segments... Great, thanx a lot, any idea about the naming of the files of those other segments? > > regards, tom lane > -- Achilleas Mantzios
2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>: > Στις Thursday 22 April 2010 16:53:05 ο/η Cédric Villemain έγραψε: >> 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>: >> > Hello, >> > i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem) >> > I think that this server was under some sort of constant resets or hardware failures. >> > Initially,i had this problem: >> > ERROR: invalid page header in block 672720 of relation "pg_toast_125716009" >> > >> > This toast table corresponds to a table named "mail_message", >> > Table "public.mail_message" >> > Column | Type | Modifiers >> > -----------+-------------------+----------------------------------------------------------- >> > msgno | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass) >> > msgsource | bytea | >> > Indexes: >> > "mail_message_key" PRIMARY KEY, btree (msgno) >> > >> > (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance. >> > I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this >> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html >> > >> > i found the oid of the table: >> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1 >> > tableoid | ?column? >> > -----------+---------- >> > 125716013 | 1 >> > >> > (and just to verify) >> > SELECT relname from pg_class where oid=125716013; >> > relname >> > -------------------- >> > pg_toast_125716009 >> > >> > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital) >> > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1 >> >> segment have 1.1GB size maximum. You have to catch in what segment the >> faulty block is, and reajust the block value from the error report to >> the real one in the good segment. >> > > Thanx, > Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr/md.c > i see the the error comes from function mdnblocks > > if (nblocks > ((BlockNumber) RELSEG_SIZE)) > elog(FATAL, "segment too big"); > > That means, that some segment file is bigger than RELSEG_SIZE > At least in my system: > #define BLCKSZ 8192 > #define RELSEG_SIZE (0x40000000 / BLCKSZ) > So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 bytes = 1GB) > > Currently i dont have any access to the machine but tomorrow i will check the file sizes. > > Can anyone shed some light as to some method of identifying all the segment files of a table? > The first one has the same name as the tableoid. > How about the subsequent segments? Your execution of dd make your first segment bigger than expected. Other segment have the same name with a .1 .2 etc suffix. You have to shrink your first segment to the correct size. check what happens, you should have now the original error. And, I have never used it, but I think it is the purpose of zero_damaged_pages to parameter to allow postgresql itself to zero the bad black. (reading src/backend/storage/buffer/bufmgr.c confirm that. *BUT* take care that it will zero *every* bad page, perhaps not only the one trapping an error. In those situation, it is good to make a snapshot of the pgdata directory, in case your fingers surf too fast on the keyboard.... If you don't want to activate zero_damage_page, then go and calculate which block in which segment you have to zeroing. side note, it may be usefull to have the relevant information in the error message... > >> > >> > However, after that, unfortunately i get constant postgresql server restarts with: >> > FATAL: segment too big >> > server closed the connection unexpectedly >> > This probably means the server terminated abnormally >> > before or while processing the request. >> > The connection to the server was lost. Attempting reset: Succeeded. >> > >> > Is there anything i can do to savage the situation? >> > >> > (one of) the hard part here is that i dont have neither physical nor network access to the server >> > (only ultra expensive unreliable satellite comms) >> > >> > Thanks for any hints... >> > >> > -- >> > Achilleas Mantzios >> > >> > -- >> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) >> > To make changes to your subscription: >> > http://www.postgresql.org/mailpref/pgsql-admin >> > >> >> >> > > > > -- > Achilleas Mantzios > -- Cédric Villemain
Στις Thursday 22 April 2010 19:02:00 ο/η Cédric Villemain έγραψε: > 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>: > > Στις Thursday 22 April 2010 16:53:05 ο/η Cédric Villemain έγραψε: > >> 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>: > >> > Hello, > >> > i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem) > >> > I think that this server was under some sort of constant resets or hardware failures. > >> > Initially,i had this problem: > >> > ERROR: invalid page header in block 672720 of relation "pg_toast_125716009" > >> > > >> > This toast table corresponds to a table named "mail_message", > >> > Table "public.mail_message" > >> > Column | Type | Modifiers > >> > -----------+-------------------+----------------------------------------------------------- > >> > msgno | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass) > >> > msgsource | bytea | > >> > Indexes: > >> > "mail_message_key" PRIMARY KEY, btree (msgno) > >> > > >> > (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance. > >> > I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this > >> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html > >> > > >> > i found the oid of the table: > >> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1 > >> > tableoid | ?column? > >> > -----------+---------- > >> > 125716013 | 1 > >> > > >> > (and just to verify) > >> > SELECT relname from pg_class where oid=125716013; > >> > relname > >> > -------------------- > >> > pg_toast_125716009 > >> > > >> > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital) > >> > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1 > >> > >> segment have 1.1GB size maximum. You have to catch in what segment the > >> faulty block is, and reajust the block value from the error report to > >> the real one in the good segment. > >> > > > > Thanx, > > Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr/md.c > > i see the the error comes from function mdnblocks > > > > if (nblocks > ((BlockNumber) RELSEG_SIZE)) > > elog(FATAL, "segment too big"); > > > > That means, that some segment file is bigger than RELSEG_SIZE > > At least in my system: > > #define BLCKSZ 8192 > > #define RELSEG_SIZE (0x40000000 / BLCKSZ) > > So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 bytes = 1GB) > > > > Currently i dont have any access to the machine but tomorrow i will check the file sizes. > > > > Can anyone shed some light as to some method of identifying all the segment files of a table? > > The first one has the same name as the tableoid. > > How about the subsequent segments? > > Your execution of dd make your first segment bigger than expected. > Other segment have the same name with a .1 .2 etc suffix. > > You have to shrink your first segment to the correct size. > check what happens, you should have now the original error. > > And, I have never used it, but I think it is the purpose of > zero_damaged_pages to parameter to allow postgresql itself to zero the > bad black. (reading > src/backend/storage/buffer/bufmgr.c confirm that. > > *BUT* take care that it will zero *every* bad page, perhaps not only > the one trapping an error. > > In those situation, it is good to make a snapshot of the pgdata > directory, in case your fingers surf too fast on the keyboard.... > > If you don't want to activate zero_damage_page, then go and calculate > which block in which segment you have to zeroing. > > side note, it may be usefull to have the relevant information in the > error message... > Many Thanks, Cédric Villemain and Tom What i did was first to correct the first segment file with smth like dd if=216293737 of=216293737.good seek=0 bs=8192 count=131072 which effectively truncates all but the first 131072 blocks (or 2^30 bytes = 1GB) After that was done, and restarting postgresql backend, then i fell back to the situation with the Invalid page header, as noted before. I stopped the backend and calculated the exact segment file and offset where the problem was: The block with the invalid header was the block with number: 672720 Now each segment contains at most 131072 blocks, with all but the last containing exactly 131072 blocks. So my problematic segment was the one with number: 672720 /131072 = 5 and the block offset inside this segment was: 672720 - (5*131072) = 17360 so i tried to zero that patricular block with dd conv=notrunc if=216293737.5 of=216293737.5.GOOD seek=17360 bs=8192 count=1 i started postgresql and i threw out a warning about initializing this zero page. After that, i reported error in header for block 672740, i repeated the same procedure, and i was able to go further and even reindex the whole database which went fine. I tried as the ultimate test (and a useful one at this point) to pg_dump the database. Unfortunately one table seems to be in error: the error is: pg_dump: SQL command failed pg_dump: Error message from server: ERROR: compressed data is corrupt pg_dump: The command was: COPY public.mail_entity (msgno, entno, entparentno.... I think the initial issue of this thread is solved, i'll come back with news on the pg_dump issue. > > > >> > > >> > However, after that, unfortunately i get constant postgresql server restarts with: > >> > FATAL: segment too big > >> > server closed the connection unexpectedly > >> > This probably means the server terminated abnormally > >> > before or while processing the request. > >> > The connection to the server was lost. Attempting reset: Succeeded. > >> > > >> > Is there anything i can do to savage the situation? > >> > > >> > (one of) the hard part here is that i dont have neither physical nor network access to the server > >> > (only ultra expensive unreliable satellite comms) > >> > > >> > Thanks for any hints... > >> > > >> > -- > >> > Achilleas Mantzios > >> > > >> > -- > >> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > >> > To make changes to your subscription: > >> > http://www.postgresql.org/mailpref/pgsql-admin > >> > > >> > >> > >> > > > > > > > > -- > > Achilleas Mantzios > > > > > > -- > Cédric Villemain > -- Achilleas Mantzios