Обсуждение: pg_dump error
Hi, I just ran into an error from pg_dump: /usr/local/pgsql/bin/pg_dump -F p my_db pg_dump: ERROR: unexpected chunk number 3292 (expected 5) for toast value 144391872 pg_dump: SQL command to dump the contents of table "measured_bioassay_base" failed: PQendcopy() failed. pg_dump: Error message from server: ERROR: unexpected chunk number 3292 (expected 5) for toast value 144391872 pg_dump: The command was: COPY public.measured_bioassay_base (bioassay_id, name, owner_id, writer_id, readers_id, is_deleted, delete_time, expt_id, protocol_application_id, source_bioassay_id, filename, fileheader, format, date_uploaded) TO stdout; Any ideas whats happened here? It looks like one of my tables hasn't backed up correctly. This table has a text field which contains large amounts of data per field. The dump file is much smaller than last nights dump. (PostgreSQL 7.4.8 on i686-pc-linux-gnu, compiled by GCC 2.95.4) Thanks for any help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > pg_dump: ERROR: unexpected chunk number 3292 (expected 5) for toast value > 144391872 > pg_dump: SQL command to dump the contents of table "measured_bioassay_base" > failed: PQendcopy() failed. > Any ideas whats happened here? If you're lucky, it's just a corrupted index on the toast table for "measured_bioassay_base", in which case a REINDEX will fix it. You'll need to look at the pg_class entry for "measured_bioassay_base" to find out what the toast table's name is. Something like regression=# select relname from pg_class where oid = regression-# (select reltoastrelid from pg_class where relname = 'measured_bioa ssay_base'); relname ----------------- pg_toast_597768 (1 row) regression=# reindex table pg_toast.pg_toast_597768; REINDEX I would be interested to see "pg_filedump -i -f" output for the toast table's index before you blow it away, if you have time for that. regards, tom lane
On Wed, Sep 07, 2005 at 10:50:44AM +0100, Adam Witney wrote: > pg_dump: ERROR: unexpected chunk number 3292 (expected 5) for toast value > 144391872 > pg_dump: SQL command to dump the contents of table "measured_bioassay_base" You might have a corrupted TOAST table. Have you had any hardware problems, database or system crashes, etc., lately? Do you see any unusual messages in the database or system log files? What are the results of the following queries? SELECT reltoastrelid::regclass FROM pg_class WHERE relname = 'measured_bioassay_base'; (The following query refers to the above result as pg_toast.pg_toast_XXXXXX.) SELECT xmin, xmax, chunk_seq, length(chunk_data) FROM pg_toast.pg_toast_XXXXXX WHERE chunk_id = 144391872; -- Michael Fuhr
On 7/9/05 10:50 am, "Adam Witney" <awitney@sgul.ac.uk> wrote: > > Hi, > > I just ran into an error from pg_dump: > > /usr/local/pgsql/bin/pg_dump -F p my_db > > pg_dump: ERROR: unexpected chunk number 3292 (expected 5) for toast value > 144391872 > pg_dump: SQL command to dump the contents of table "measured_bioassay_base" > failed: PQendcopy() failed. > pg_dump: Error message from server: ERROR: unexpected chunk number 3292 > (expected 5) for toast value 144391872 > pg_dump: The command was: COPY public.measured_bioassay_base (bioassay_id, > name, owner_id, writer_id, readers_id, is_deleted, delete_time, expt_id, > protocol_application_id, source_bioassay_id, filename, fileheader, format, > date_uploaded) TO stdout; > > Any ideas whats happened here? It looks like one of my tables hasn't backed > up correctly. This table has a text field which contains large amounts of > data per field. The dump file is much smaller than last nights dump. > > (PostgreSQL 7.4.8 on i686-pc-linux-gnu, compiled by GCC 2.95.4) I think I have found the offending row in measured_bioassay_base... The entry in its toast table looks like this bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where chunk_id = 144391872; chunk_id | chunk_seq | length -----------+-----------+-------- 144391872 | 0 | 1998 144391872 | 1 | 1998 144391872 | 2 | 1998 144391872 | 3 | 1998 144391872 | 4 | 1998 7625296 | 3292 | 24 7625297 | 3292 | 24 7625298 | 3292 | 24 7625299 | 3292 | 24 7625308 | 3292 | 19 7625309 | 3292 | 16 7625310 | 3292 | 16 7625311 | 3292 | 20 7625320 | 3292 | 20 7625321 | 3292 | 19 7625322 | 3292 | 19 7625323 | 3292 | 20 7625332 | 3292 | 19 7625333 | 3292 | 19 7625334 | 3292 | 19 7625335 | 3292 | 19 7625344 | 3292 | 20 7625345 | 3292 | 20 7625346 | 3292 | 18 7625347 | 3292 | 18 7625356 | 3292 | 19 7625357 | 3292 | 19 7625358 | 3292 | 18 7625359 | 3292 | 20 7625368 | 3292 | 19 7625369 | 3292 | 20 7625370 | 3292 | 20 7625371 | 3292 | 20 7625380 | 3292 | 19 7625381 | 3292 | 19 7625382 | 3292 | 19 7625383 | 3292 | 19 144391872 | 37 | 1998 144391872 | 38 | 1998 144391872 | 39 | 1998 .. 200 more rows.. I can delete the row in measured_bioassay_base and reload the data for it, but i'd like to know if it is a symptom of a bigger problem, any ideas how this might have happened? Thanks for any help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
> I would be interested to see "pg_filedump -i -f" output for the toast > table's index before you blow it away, if you have time for that. Yep I can run this.... Can't seem to locate pg_filedump though... Its not in contrib... Anyone know where it is these days? -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On 7/9/05 4:28 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >>> I would be interested to see "pg_filedump -i -f" output for the toast >>> table's index before you blow it away, if you have time for that. > >> Yep I can run this.... Can't seem to locate pg_filedump though... Its not in >> contrib... Anyone know where it is these days? > > Same place it's been for awhile: > http://sources.redhat.com/rhdb/ Got that now.... Sorry but I am being dim... But where do I find the file to dump.. You want the index on pg_toast_134401982 right? -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: >> I would be interested to see "pg_filedump -i -f" output for the toast >> table's index before you blow it away, if you have time for that. > Yep I can run this.... Can't seem to locate pg_filedump though... Its not in > contrib... Anyone know where it is these days? Same place it's been for awhile: http://sources.redhat.com/rhdb/ Be sure to grab the version that matches your PG version. regards, tom lane
Adam Witney <awitney@sgul.ac.uk> writes: > Got that now.... Sorry but I am being dim... But where do I find the file to > dump.. You want the index on pg_toast_134401982 right? Right (if that's the toast table you identified for the problem table). It's probably named pg_toast_134401982_index. Look at its pg_class entry to find its relfilenode number, and find the OID of the database from pg_database, and then the file should be $PGDATA/base/DBOID/RELFILENODE. For more info see http://www.postgresql.org/docs/8.0/static/storage.html (which is for 8.0, but everything except the remarks about tablespaces applies to 7.4). regards, tom lane
Adam Witney <awitney@sgul.ac.uk> writes: > I think I have found the offending row in measured_bioassay_base... The > entry in its toast table looks like this > bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from > pg_toast.pg_toast_134401982 where chunk_id = 144391872; > chunk_id | chunk_seq | length > -----------+-----------+-------- > 144391872 | 0 | 1998 > 144391872 | 1 | 1998 > 144391872 | 2 | 1998 > 144391872 | 3 | 1998 > 144391872 | 4 | 1998 > 7625296 | 3292 | 24 > 7625297 | 3292 | 24 > 7625298 | 3292 | 24 > 7625299 | 3292 | 24 > 7625308 | 3292 | 19 This is pretty wacko, because as far as I can see there is nothing wrong with the index at all --- in particular, no gap in the chunk id/seq series here, according to pg_filedump. (The last two words of each index item are the chunk_id and chunk_seq, and we are looking at LSB-first data) Item 102 -- Length: 16 Offset: 6544 (0x1990) Flags: USED Block Id: 165390 linp Index: 4 Size: 16 Has Nulls: 0 Has Varwidths: 0 1990: 02000e86 04001000 c03e9b08 00000000 .........>...... Item 103 -- Length: 16 Offset: 6528 (0x1980) Flags: USED Block Id: 165391 linp Index: 1 Size: 16 Has Nulls: 0 Has Varwidths: 0 1980: 02000f86 01001000 c03e9b08 01000000 .........>...... Item 104 -- Length: 16 Offset: 6512 (0x1970) Flags: USED Block Id: 165391 linp Index: 2 Size: 16 Has Nulls: 0 Has Varwidths: 0 1970: 02000f86 02001000 c03e9b08 02000000 .........>...... Item 105 -- Length: 16 Offset: 6496 (0x1960) Flags: USED Block Id: 165391 linp Index: 3 Size: 16 Has Nulls: 0 Has Varwidths: 0 1960: 02000f86 03001000 c03e9b08 03000000 .........>...... Item 106 -- Length: 16 Offset: 6480 (0x1950) Flags: USED Block Id: 165391 linp Index: 4 Size: 16 Has Nulls: 0 Has Varwidths: 0 1950: 02000f86 04001000 c03e9b08 04000000 .........>...... Item 107 -- Length: 16 Offset: 6464 (0x1940) Flags: USED Block Id: 165392 linp Index: 1 Size: 16 Has Nulls: 0 Has Varwidths: 0 1940: 02001086 01001000 c03e9b08 05000000 .........>...... Item 108 -- Length: 16 Offset: 6448 (0x1930) Flags: USED Block Id: 165392 linp Index: 2 Size: 16 Has Nulls: 0 Has Varwidths: 0 1930: 02001086 02001000 c03e9b08 06000000 .........>...... Item 109 -- Length: 16 Offset: 6432 (0x1920) Flags: USED Block Id: 165392 linp Index: 3 Size: 16 Has Nulls: 0 Has Varwidths: 0 1920: 02001086 03001000 c03e9b08 07000000 .........>...... Item 110 -- Length: 16 Offset: 6416 (0x1910) Flags: USED Block Id: 165392 linp Index: 4 Size: 16 Has Nulls: 0 Has Varwidths: 0 1910: 02001086 04001000 c03e9b08 08000000 .........>...... What I think is going on is that you have a corrupted copy of this index page sitting in shared buffers. It's not been marked dirty, else it'd have been written out by a checkpoint and we could see the damage with pg_filedump. This implies that the problem would go away by itself if you simply shut down and restarted the database. However, since you've been so helpful so far ;-) I wonder whether there is any chance of getting an image of the bad page in memory before you do that. We are presumably looking at the results of an unintentional memory stomp somewhere in the code, and it's possible that a dump of the damaged page would give a clue how it happened. The only simple way I can think of for getting that info involves gdb, but I don't suppose you are using a copy built with --enable-debug ... anyone have other ideas? regards, tom lane
I wrote > Adam Witney <awitney@sgul.ac.uk> writes: >> I think I have found the offending row in measured_bioassay_base... The >> entry in its toast table looks like this >> bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from >> pg_toast.pg_toast_134401982 where chunk_id = 144391872; >> chunk_id | chunk_seq | length >> -----------+-----------+-------- >> 144391872 | 0 | 1998 >> 144391872 | 1 | 1998 >> 144391872 | 2 | 1998 >> 144391872 | 3 | 1998 >> 144391872 | 4 | 1998 >> 7625296 | 3292 | 24 >> 7625297 | 3292 | 24 >> 7625298 | 3292 | 24 >> 7625299 | 3292 | 24 >> 7625308 | 3292 | 19 > This is pretty wacko, because as far as I can see there is nothing wrong > with the index at all --- in particular, no gap in the chunk id/seq > series here, according to pg_filedump. Actually, it's also possible that the index is fine and the problem is in the underlying toast table ... which would be unfortunate, because it'd mean that there's no easy way out like a REINDEX or database restart. We can check this by seeing if the data looks the same using the CTIDs that we can see in the index. Please try the same query as above, ie, select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ... using these WHERE conditions: ctid = '(165390,4)' ctid = '(165391,1)' ctid = '(165391,2)' ctid = '(165391,3)' ctid = '(165391,4)' ctid = '(165392,1)' ctid = '(165392,2)' ctid = '(165392,3)' ctid = '(165392,4)' It'd also be interesting to look at pg_filedump data for pages 165390 through 165392 of the toast table itself (not the index) --- see -R option of pg_filedump to limit the range of pages. regards, tom lane
On 7/9/05 6:12 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > I wrote >> Adam Witney <awitney@sgul.ac.uk> writes: >>> I think I have found the offending row in measured_bioassay_base... The >>> entry in its toast table looks like this > >>> bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from >>> pg_toast.pg_toast_134401982 where chunk_id = 144391872; >>> chunk_id | chunk_seq | length >>> -----------+-----------+-------- >>> 144391872 | 0 | 1998 >>> 144391872 | 1 | 1998 >>> 144391872 | 2 | 1998 >>> 144391872 | 3 | 1998 >>> 144391872 | 4 | 1998 >>> 7625296 | 3292 | 24 >>> 7625297 | 3292 | 24 >>> 7625298 | 3292 | 24 >>> 7625299 | 3292 | 24 >>> 7625308 | 3292 | 19 > >> This is pretty wacko, because as far as I can see there is nothing wrong >> with the index at all --- in particular, no gap in the chunk id/seq >> series here, according to pg_filedump. > > Actually, it's also possible that the index is fine and the problem is > in the underlying toast table ... which would be unfortunate, because > it'd mean that there's no easy way out like a REINDEX or database > restart. > > We can check this by seeing if the data looks the same using the CTIDs > that we can see in the index. Please try the same query as above, ie, > select chunk_id, chunk_seq, length(chunk_data) from > pg_toast.pg_toast_134401982 where ... > using these WHERE conditions: > > ctid = '(165390,4)' > ctid = '(165391,1)' > ctid = '(165391,2)' > ctid = '(165391,3)' > ctid = '(165391,4)' > ctid = '(165392,1)' > ctid = '(165392,2)' > ctid = '(165392,3)' > ctid = '(165392,4)' > > It'd also be interesting to look at pg_filedump data for pages 165390 > through 165392 of the toast table itself (not the index) --- see -R > option of pg_filedump to limit the range of pages. A REINDEX on the toast table did not fix the problem. Here is the output from your queries above: bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ctid = '(165390,4)'; chunk_id | chunk_seq | length -----------+-----------+-------- 144391872 | 0 | 1998 (1 row) bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ctid = '(165391,1)'; chunk_id | chunk_seq | length -----------+-----------+-------- 144391872 | 1 | 1998 (1 row) bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ctid = '(165391,2)'; chunk_id | chunk_seq | length -----------+-----------+-------- 144391872 | 2 | 1998 (1 row) bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ctid = '(165391,3)'; chunk_id | chunk_seq | length -----------+-----------+-------- 144391872 | 3 | 1998 (1 row) bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ctid = '(165391,4)'; chunk_id | chunk_seq | length -----------+-----------+-------- 144391872 | 4 | 1998 (1 row) bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ctid = '(165392,1)'; chunk_id | chunk_seq | length ----------+-----------+-------- 7625296 | 3292 | 24 (1 row) bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ctid = '(165392,2)'; chunk_id | chunk_seq | length ----------+-----------+-------- 7625297 | 3292 | 24 (1 row) bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ctid = '(165392,3)'; chunk_id | chunk_seq | length ----------+-----------+-------- 7625298 | 3292 | 24 (1 row) bugasbase2=# select chunk_id, chunk_seq, length(chunk_data) from pg_toast.pg_toast_134401982 where ctid = '(165392,4)'; chunk_id | chunk_seq | length ----------+-----------+-------- 7625299 | 3292 | 24 (1 row) Unfortunately the pg_filedump you requested gives an error: pg_filedump -R 165390 165392 134401986 ******************************************************************* * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 * * File: 134401986.1 * Options used: -R 165390 165392 * * Dump created on: Thu Sep 8 10:33:47 2005 ******************************************************************* Error: Premature end of file encountered. (there is a .1 file also for this table, do I have to stick the two together or something?) Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On Thu, Sep 08, 2005 at 10:32:41AM +0100, Adam Witney wrote: > Unfortunately the pg_filedump you requested gives an error: > > pg_filedump -R 165390 165392 134401986 > > ******************************************************************* > * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 > * > * File: 134401986.1 > * Options used: -R 165390 165392 > * > * Dump created on: Thu Sep 8 10:33:47 2005 > ******************************************************************* > Error: Premature end of file encountered. > > (there is a .1 file also for this table, do I have to stick the two together > or something?) No, rather tell pg_filedump to use the second segment (.1 file) and substract the total number of blocks in the first segment. If my math is right there are 131072 blocks of 8kB per 1GB segment, so to dump block 165390 you need to ask for block 34318 in the second segment: pg_filedump -R 34318 34320 134401986.1 -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Et put se mouve" (Galileo Galilei)
On 8/9/05 2:38 pm, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote: > On Thu, Sep 08, 2005 at 10:32:41AM +0100, Adam Witney wrote: > >> Unfortunately the pg_filedump you requested gives an error: >> >> pg_filedump -R 165390 165392 134401986 >> >> ******************************************************************* >> * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 >> * >> * File: 134401986.1 >> * Options used: -R 165390 165392 >> * >> * Dump created on: Thu Sep 8 10:33:47 2005 >> ******************************************************************* >> Error: Premature end of file encountered. >> >> (there is a .1 file also for this table, do I have to stick the two together >> or something?) > > No, rather tell pg_filedump to use the second segment (.1 file) and > substract the total number of blocks in the first segment. If my math > is right there are 131072 blocks of 8kB per 1GB segment, so to dump > block 165390 you need to ask for block 34318 in the second segment: > > pg_filedump -R 34318 34320 134401986.1 Ah thanks..... pg_filedump-3.0/pg_filedump -R 34318 34320 134401986.1 ******************************************************************* * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 * * File: 134401986.1 * Options used: -R 34318 34320 * * Dump created on: Thu Sep 8 14:49:05 2005 ******************************************************************* Block 34318 ******************************************************** <Header> ----- Block Offset: 0x10c1c000 Offsets: Lower 36 (0x0024) Block: Size 8192 Version 1 Upper 272 (0x0110) LSN: logid 16 recoff 0x2b0ab37c Special 8192 (0x2000) Items: 4 Free Space: 236 Length (including item array): 40 <Data> ------ Item 1 -- Length: 2034 Offset: 6156 (0x180c) Flags: USED Item 2 -- Length: 2034 Offset: 4120 (0x1018) Flags: USED Item 3 -- Length: 1811 Offset: 2308 (0x0904) Flags: USED Item 4 -- Length: 2034 Offset: 272 (0x0110) Flags: USED Block 34319 ******************************************************** <Header> ----- Block Offset: 0x10c1e000 Offsets: Lower 36 (0x0024) Block: Size 8192 Version 1 Upper 48 (0x0030) LSN: logid 16 recoff 0x2b0ad4d0 Special 8192 (0x2000) Items: 4 Free Space: 12 Length (including item array): 40 <Data> ------ Item 1 -- Length: 2034 Offset: 6156 (0x180c) Flags: USED Item 2 -- Length: 2034 Offset: 4120 (0x1018) Flags: USED Item 3 -- Length: 2034 Offset: 2084 (0x0824) Flags: USED Item 4 -- Length: 2034 Offset: 48 (0x0030) Flags: USED Block 34320 ******************************************************** <Header> ----- Block Offset: 0x10c20000 Offsets: Lower 68 (0x0044) Block: Size 8192 Version 1 Upper 232 (0x00e8) LSN: logid 16 recoff 0x2d0fedcc Special 8192 (0x2000) Items: 12 Free Space: 164 Length (including item array): 72 <Data> ------ Item 1 -- Length: 672 Offset: 7520 (0x1d60) Flags: USED Item 2 -- Length: 668 Offset: 6852 (0x1ac4) Flags: USED Item 3 -- Length: 656 Offset: 6196 (0x1834) Flags: USED Item 4 -- Length: 652 Offset: 5544 (0x15a8) Flags: USED Item 5 -- Length: 700 Offset: 4844 (0x12ec) Flags: USED Item 6 -- Length: 652 Offset: 4192 (0x1060) Flags: USED Item 7 -- Length: 668 Offset: 3524 (0x0dc4) Flags: USED Item 8 -- Length: 656 Offset: 2868 (0x0b34) Flags: USED Item 9 -- Length: 684 Offset: 2184 (0x0888) Flags: USED Item 10 -- Length: 648 Offset: 1536 (0x0600) Flags: USED Item 11 -- Length: 652 Offset: 884 (0x0374) Flags: USED Item 12 -- Length: 652 Offset: 232 (0x00e8) Flags: USED *** End of Requested Range Encountered. Last Block Read: 34320 *** I have deleted the offending row in the table and now pg_dump does not give the error, so it seems to be fixed. I do have a copy of the toast.pg_toast_134401982 files 134401986 and 134401986.1 if you want me to do any further analysis for you? Thanks again for your help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > Unfortunately the pg_filedump you requested gives an error: > Error: Premature end of file encountered. > (there is a .1 file also for this table, do I have to stick the two together > or something?) Yeah. Each file contains 131072 pages, so the correct place to look is 165390-131072 in the .1 file, ie pg_filedump -R 34318 34320 134401986.1 (I need to look at the pg_filedump code, too, because it seems like it ought to be able to do that for itself...) regards, tom lane
Adam Witney <awitney@sgul.ac.uk> writes: > Ah thanks..... > pg_filedump-3.0/pg_filedump -R 34318 34320 134401986.1 Could we see this with -i -f options please? At this level of detail there's nothing obviously wrong ... regards, tom lane