Обсуждение: Still big problems with pg_dump!
Hello guys, I have still problems with dumping my database.... I have postgres 7.2.1 running on a solaris 8 server. When I try to do a pg_dump of my database, I get the following message: pg_dump: query to obtain list of tables failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. I connect to the database and try to view the tables with \dt and \dS. Now I get: ERROR: AllocSetFree: cannot find block containing chunk 4c5ad0 I retry: ERROR: AllocSetFree: cannot find block containing chunk 4860d0 I can view the tables with: \d tablename Some people suggest a drive failure, but I checked that and found no problems... I REINDEXED the whole database... problem still the same... Tried a VACUUM... still not working... I must say that one of the table contains more than 3.000.000 rows, another more than 1.400.000... Select, update, delete, insert works, just the pg_dump(all) and the \dt \dS commands... I must say that I had this problem a few months before, I got some help then, but that couldn't solve my problem, I recreated the database from scratch and copied the data, to fix thing quickly. Thing went well for about two months :-( Now the problem raises again, and I'm trying to find a solution without reinstalling the whole thing allover again. Some advice/help from the specialists? Cheers! Wim. Some info from the debug logfile: --------------------------------- DEBUG: StartTransactionCommand DEBUG: query: SELECT c.relname as "Name", CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' END as "Type", u.usename as "Owner" FROM pg_class c LEFT JOIN pg_user u ON c.relowner = u.usesysid WHERE c.relkind IN ('r','') AND c.relname !~ '^pg_' ORDER BY 1; DEBUG: parse tree: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable ({ RTE :relname pg_class :relid 1259 :subquery <> :alias { ATTR :relname c :attrs <>} :eref { ATTR :relname c :attrs ( "relname" "reltype" "relowner" "relam" "relfilenode" "relpages" "reltuples" "reltoastrelid" "reltoastidxid" "relhasindex" "relisshared" "relkind" "relnatts" "relchecks" "reltriggers" "relukeys" "relfkeys" "relrefs" "relhasoids" "relhaspkey" "relhasrules" "relhassubclass" "relacl" )} :inh true :inFromCl true :checkForRead true :checkForWrite false :checkAsUser 0} { RTE :relname pg_user :relid 16478 :subquery <> :alias { ATTR :relname u :attrs <>} :eref { ATTR :relname u :attrs ( "usename" "usesysid" "usecreatedb" "usetrace" "usesuper" "usecatupd" "passwd" "valuntil" )} :inh true :inFromCl true :checkForRead true :checkForWrite false :checkAsUser 0}) :jointree { FROMEXPR :fromlist ({ JOINEXPR :jointype 1 :isNatural false :larg { RANGETBLREF 1 } :rarg { RANGETBLREF 2 } :using <> :quals { EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 3 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3} { VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 2})} :alias <> :colnames ( "relname" "reltype" "relowner" "relam" "relfilenode" "relpages" "reltuples" "reltoastrelid" "reltoastidxid" "relhasindex" "relisshared" "relkind" "relnatts" "relchecks" "reltriggers" "relukeys" "relfkeys" "relrefs" "relhasoids" "relhaspkey" "relhasrules" "relhassubclass" "relacl" "usename" "usesysid" "usecreatedb" "usetrace" "usesuper" "usecatupd" "passwd" "valuntil" ) :colvars ({ VAR :varno 1 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1} { VAR :varno 1 :varattno 2 :vartype 26 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2} { VAR :varno 1 :varattno 3 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3} { VAR :varno 1 :varattno 4 :vartype 26 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 4} { VAR :varno 1 :varattno 5 :vartype 26 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 5} { VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 6} { VAR :varno 1 :varattno 7 :vartype 700 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 7} { VAR :varno 1 :varattno 8 :vartype 26 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 8} { VAR :varno 1 :varattno 9 :vartype 26 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 9} { VAR :varno 1 :varattno 10 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 10} { VAR :varno 1 :varattno 11 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 11} { VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 12} { VAR :varno 1 :varattno 13 :vartype 21 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 13} { VAR :varno 1 :varattno 14 :vartype 21 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 14} { VAR :varno 1 :varattno 15 :vartype 21 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 15} { VAR :varno 1 :varattno 16 :vartype 21 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 16} { VAR :varno 1 :varattno 17 :vartype 21 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 17} { VAR :varno 1 :varattno 18 :vartype 21 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 18} { VAR :varno 1 :varattno 19 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 19} { VAR :varno 1 :varattno 20 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 20} { VAR :varno 1 :varattno 21 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 21} { VAR :varno 1 :varattno 22 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 22} { VAR :varno 1 :varattno 23 :vartype 1034 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 23} { VAR :varno 2 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 1} { VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 2} { VAR :varno 2 :varattno 3 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 3} { VAR :varno 2 :varattno 4 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 4} { VAR :varno 2 :varattno 5 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 5} { VAR :varno 2 :varattno 6 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 6} { VAR :varno 2 :varattno 7 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 7} { VAR :varno 2 :varattno 8 :vartype 702 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 8})}) :quals { EXPR :typeOid 16 :opType and :oper <> :args ({ EXPR :typeOid 16 :opType or :oper <> :args ({ EXPR :typeOid 16 :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval true :constisnull false :constvalue 1 [ 0 0 0 114 ] })} { EXPR :typeOid 16 :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval true :constisnull false :constvalue 1 [ 0 0 0 0 ] })})} { EXPR :typeOid 16 :opType op :oper { OPER :opno 640 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST :consttype 25 :constlen -1 :constbyval false :constisnull false :constvalue 8 [ 0 0 0 8 94 112 103 95 ] })})}} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname Name :reskey 0 :reskeyop 0 :ressortgroupref 1 :resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 25 :restypmod -1 :resname Type :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { CASE :casetype 25 :arg <> :args ({ WHEN { EXPR :typeOid 16 :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval true :constisnull false :constvalue 1 [ 0 0 0 114 ] })} :then { CONST :consttype 25 :constlen -1 :constbyval false :constisnull false :constvalue 9 [ 0 0 0 9 116 97 98 108 101 ] }} { WHEN { EXPR :typeOid 16 :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval true :constisnull false :constvalue 1 [ 0 0 0 118 ] })} :then { CONST :consttype 25 :constlen -1 :constbyval false :constisnull false :constvalue 8 [ 0 0 0 8 118 105 101 119 ] }} { WHEN { EXPR :typeOid 16 :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval true :constisnull false :constvalue 1 [ 0 0 0 105 ] })} :then { CONST :consttype 25 :constlen -1 :constbyval false :constisnull false :constvalue 9 [ 0 0 0 9 105 110 100 101 120 ] }} { WHEN { EXPR :typeOid 16 :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval true :constisnull false :constvalue 1 [ 0 0 0 83 ] })} :then { CONST :consttype 25 :constlen -1 :constbyval false :constisnull false :constvalue 12 [ 0 0 0 12 115 101 113 117 101 110 99 101 ] }} { WHEN { EXPR :typeOid 16 :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval true :constisnull false :constvalue 1 [ 0 0 0 115 ] })} :then { CONST :consttype 25 :constlen -1 :constbyval false :constisnull false :constvalue 11 [ 0 0 0 11 115 112 101 99 105 97 108 ] }}) :defresult { CONST :consttype 25 :constlen -1 :constbyval false :constisnull true :constvalue <>}}} { TARGETENTRY :resdom { RESDOM :resno 3 :restype 19 :restypmod -1 :resname Owner :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 2 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 1}}) :groupClause <> :havingQual <> :distinctClause <> :sortClause ({ SORTCLAUSE :tleSortGroupRef 1 :sortop 660 }) :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()} ERROR: AllocSetFree: cannot find block containing chunk 48a6d8 DEBUG: AbortCurrentTransaction DEBUG: BackendStartup: forked pid=9789 socket=9 postmaster child[9789]: starting with (postgres -d3 -v131072 -p belbonedb_v2 ) DEBUG: InitPostgres DEBUG: StartTransactionCommand DEBUG: query: set DateStyle to 'ISO' DEBUG: parse tree: { QUERY :command 5 :utility ? :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable <> :jointree <> :rowMarks () :targetList <> :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()} DEBUG: ProcessUtility: set DateStyle to 'ISO' DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: query: set geqo to 'OFF' DEBUG: parse tree: { QUERY :command 5 :utility ? :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable <> :jointree <> :rowMarks () :targetList <> :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()} DEBUG: ProcessUtility: set geqo to 'OFF' DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: query: set ksqo to 'ON' DEBUG: parse tree: { QUERY :command 5 :utility ? :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable <> :jointree <> :rowMarks () :targetList <> :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()} DEBUG: ProcessUtility: set ksqo to 'ON' DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: query: select oid from pg_type where typname='lo' DEBUG: parse tree: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable ({ RTE :relname pg_type :relid 1247 :subquery <> :alias <> :eref { ATTR :relname pg_type :attrs ( "typname" "typowner" "typlen" "typprtlen" "typbyval" "typtype" "typisdefined" "typdelim" "typrelid" "typelem" "typinput" "typoutput" "typreceive" "typsend" "typalign" "typstorage" "typdefault" )} :inh true :inFromCl true :checkForRead true :checkForWrite false :checkAsUser 0}) :jointree { FROMEXPR :fromlist ({ RANGETBLREF 1 }) :quals { EXPR :typeOid 16 :opType op :oper { OPER :opno 93 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST :consttype 19 :constlen 32 :constbyval false :constisnull false :constvalue 32 [ 108 111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] })}} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 26 :restypmod -1 :resname oid :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno -2 :vartype 26 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno -2}}) :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()} DEBUG: ProcessQuery DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: query: select version() DEBUG: parse tree: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable <> :jointree { FROMEXPR :fromlist <> :quals <>} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 25 :restypmod -1 :resname version :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid 25 :opType func :oper { FUNC :funcid 89 :functype 25 } :args <>}}) :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()} DEBUG: ProcessQuery DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: query: select pg_client_encoding() DEBUG: parse tree: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable <> :jointree { FROMEXPR :fromlist <> :quals <>} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname pg_client_encoding :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid 19 :opType func :oper { FUNC :funcid 810 :functype 19 } :args <>}}) :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()} DEBUG: ProcessQuery DEBUG: CommitTransactionCommand DEBUG: StartTransactionCommand DEBUG: query: SELECT Config, nValue FROM MSysConf ERROR: Relation "msysconf" does not exist DEBUG: AbortCurrentTransaction DEBUG: StartTransactionCommand
-hackers removed. On Tue, Sep 17, 2002 at 10:11:41AM +0200, Wim wrote: > ERROR: AllocSetFree: cannot find block containing chunk 4c5ad0 This is definitely some sort of disk problem. Either you've written bad data to the disk in some way, or else the disk is corrupted or damaged. If it is a hardware problem, the obvious suspects are memory (I'd discount this idea unless everything else doesn't check out), a disk failure, or a controller failure. It could be OS related as well. Several of the 2.4 Linux kernel series, for instance, had roblems with massive filesystem corruption. > Some people suggest a drive failure, but I checked that and found no > problems... How did you check? > I must say that one of the table contains more than 3.000.000 rows, > another more than 1.400.000... When is your most recent backup? If you can't pg_dump, you will be needing that backup. > I must say that I had this problem a few months before, I got some help > then, but that couldn't solve my problem, > I recreated the database from scratch and copied the data, to fix thing > quickly. Thing went well for about two months :-( So you re-installed the data set on a machine that had somehow failed, you don't know why, and hoped that the problem would solve itself? Uh, that wasn't a good idea. In the future, if you have a problem which people suggest might be, for instance, a bad disk, it'd be a _very good_ idea to figure out precisely what the problem is before relying on the identical hardware again. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Andrew Sullivan wrote: >-hackers removed. > >On Tue, Sep 17, 2002 at 10:11:41AM +0200, Wim wrote: > > > >>ERROR: AllocSetFree: cannot find block containing chunk 4c5ad0 >> >> > >This is definitely some sort of disk problem. Either you've written >bad data to the disk in some way, or else the disk is corrupted or >damaged. > >If it is a hardware problem, the obvious suspects are memory (I'd >discount this idea unless everything else doesn't check out), a disk >failure, or a controller failure. > >It could be OS related as well. Several of the 2.4 Linux kernel >series, for instance, had roblems with massive filesystem corruption. > > Postgres is running on solaris 8... It is the same database as previous time that has the problem, but not the same table. I think it's an error is the system tables. > > >>Some people suggest a drive failure, but I checked that and found no >>problems... >> >> > >How did you check? > > with fsck. > > >>I must say that one of the table contains more than 3.000.000 rows, >>another more than 1.400.000... >> >> > >When is your most recent backup? If you can't pg_dump, you will be >needing that backup. > > Have backup... I can still SQL COPY to a text file, so that's no problem so far. > > >>I must say that I had this problem a few months before, I got some help >>then, but that couldn't solve my problem, >>I recreated the database from scratch and copied the data, to fix thing >>quickly. Thing went well for about two months :-( >> >> > >So you re-installed the data set on a machine that had somehow >failed, you don't know why, and hoped that the problem would >solve itself? Uh, that wasn't a good idea. In the future, if you >have a problem which people suggest might be, for instance, a bad >disk, it'd be a _very good_ idea to figure out precisely what the >problem is before relying on the identical hardware again. > >A > > > I know, I don't have much spare hardware, and the database had to work quickly, it was the only solution then. Checked the disk, reinstalled the OS and still waiting for a CPU and memory upgrade.
On Tue, Sep 17, 2002 at 04:25:45PM +0200, Wim wrote: > > > >>ERROR: AllocSetFree: cannot find block containing chunk 4c5ad0 > >> > >> > > > >This is definitely some sort of disk problem. Either you've written > >bad data to the disk in some way, or else the disk is corrupted or > >damaged. > > > Postgres is running on solaris 8... > It is the same database as previous time that has the problem, but not > the same table. Someone else suggested that this would not be the error when you have written bad data to the disk (I thought you could have this if the controller was flakey and wrote bad data in the past. Maybe I'm wrong. Probably). > >How did you check? > > > > > with fsck. That won't help you if the controller is coming and going; you might find that it works one time, and not another. Indeed, a disk on its way out can even pass fsck sometimes, although it's pretty unusual. > Have backup... I can still SQL COPY to a text file, so that's no problem > so far. Well, that's good. I'd suggest backing up _really often_ until you know what the problem is, especially since this is production. > I know, I don't have much spare hardware, and the database had to work > quickly, it was the > only solution then. > Checked the disk, reinstalled the OS and still waiting for a CPU and > memory upgrade. Do you have another place to store the database in the meantime -- an Intel box with a cheap disk, or anything? At least you'd have another copy of the database somewhere that way. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On 17 Sep 2002 at 10:33, Andrew Sullivan wrote: > On Tue, Sep 17, 2002 at 04:25:45PM +0200, Wim wrote: > > I know, I don't have much spare hardware, and the database had to work > > quickly, it was the > > only solution then. > > Checked the disk, reinstalled the OS and still waiting for a CPU and > > memory upgrade. > > Do you have another place to store the database in the meantime -- an > Intel box with a cheap disk, or anything? At least you'd have > another copy of the database somewhere that way. Yeah. NFS or SMB mounted database would work.. albeit slowly.. Hopefully it's not your root disk.. Bye Shridhar -- Davis's Dictum: Problems that go away by themselves, come back by themselves.
Andrew Sullivan wrote: <snip>... >>>How did you check? >>> >>> >>> >>> >>with fsck. >> >> > >That won't help you if the controller is coming and going; you might >find that it works one time, and not another. Indeed, a disk on its >way out can even pass fsck sometimes, although it's pretty unusual. > The DB is located on a RAID5 disk array... > > >>Have backup... I can still SQL COPY to a text file, so that's no problem >>so far. >> >> > >Well, that's good. I'd suggest backing up _really often_ until you >know what the problem is, especially since this is production. > > > >>I know, I don't have much spare hardware, and the database had to work >>quickly, it was the >>only solution then. >>Checked the disk, reinstalled the OS and still waiting for a CPU and >>memory upgrade. >> >> > >Do you have another place to store the database in the meantime -- an >Intel box with a cheap disk, or anything? At least you'd have >another copy of the database somewhere that way. > >A > > > Don't have a disk that can store my database... Still searching.... Cheers! Wim
On Tue, Sep 17, 2002 at 04:46:00PM +0200, Wim wrote: > > > >That won't help you if the controller is coming and going; you might > >find that it works one time, and not another. Indeed, a disk on its > >way out can even pass fsck sometimes, although it's pretty unusual. > > > The DB is located on a RAID5 disk array... Hmm. _That's_ interesting. I'd bet on a flakey controller, then. Is it hardware RAID? (I assume so.) A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Andrew Sullivan <andrew@libertyrms.info> writes: > On Tue, Sep 17, 2002 at 04:25:45PM +0200, Wim wrote: >> ERROR: AllocSetFree: cannot find block containing chunk 4c5ad0 >> >> Postgres is running on solaris 8... > Someone else suggested that this would not be the error when you have > written bad data to the disk (I thought you could have this if the > controller was flakey and wrote bad data in the past. Maybe I'm > wrong. Probably). Actually, what it looks like to me is a memory clobber; I don't think bad data on disk would be likely to lead to this particular type of failure. But writing one byte too many into a string, and thereby zeroing the high-order byte of an adjacent pointer, could lead to exactly this message when we later try to pfree() the pointer. I am wondering if Wim is running into that same Solaris snprintf() bug that we discovered awhile back --- it was not clear if the bug still exists in Solaris 8, but the symptoms sure match. See http://archives.postgresql.org/pgsql-bugs/2002-07/msg00059.php It would be useful to see a stack traceback from the point of the error, if possible. regards, tom lane
Andrew Sullivan wrote: >On Tue, Sep 17, 2002 at 04:46:00PM +0200, Wim wrote: > > >>>That won't help you if the controller is coming and going; you might >>>find that it works one time, and not another. Indeed, a disk on its >>>way out can even pass fsck sometimes, although it's pretty unusual. >>> >>> >>> >>The DB is located on a RAID5 disk array... >> >> > >Hmm. _That's_ interesting. I'd bet on a flakey controller, then. >Is it hardware RAID? (I assume so.) > >A > > > Yep, hardware RAID, infact it's a SUN T3 disk array with 9*36GB SCSI disks... Cheers! Wim
On Tue, Sep 17, 2002 at 10:55:18AM -0400, Tom Lane wrote: > > I am wondering if Wim is running into that same Solaris snprintf() bug > that we discovered awhile back --- it was not clear if the bug still > exists in Solaris 8, but the symptoms sure match. See > http://archives.postgresql.org/pgsql-bugs/2002-07/msg00059.php Hmm, good point. That was only a problem when compiled with the 64-bit libraries, IIRC. Wim, what does 'file postmaster' say? A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Tom Lane wrote: >Andrew Sullivan <andrew@libertyrms.info> writes: > > >>On Tue, Sep 17, 2002 at 04:25:45PM +0200, Wim wrote: >> >> >>>ERROR: AllocSetFree: cannot find block containing chunk 4c5ad0 >>> >>>Postgres is running on solaris 8... >>> >>> > > > >>Someone else suggested that this would not be the error when you have >>written bad data to the disk (I thought you could have this if the >>controller was flakey and wrote bad data in the past. Maybe I'm >>wrong. Probably). >> >> > >Actually, what it looks like to me is a memory clobber; I don't think >bad data on disk would be likely to lead to this particular type of >failure. But writing one byte too many into a string, and thereby >zeroing the high-order byte of an adjacent pointer, could lead to >exactly this message when we later try to pfree() the pointer. > >I am wondering if Wim is running into that same Solaris snprintf() bug >that we discovered awhile back --- it was not clear if the bug still >exists in Solaris 8, but the symptoms sure match. See >http://archives.postgresql.org/pgsql-bugs/2002-07/msg00059.php > >It would be useful to see a stack traceback from the point of the error, >if possible. > > regards, tom lane > I Would like to send a stack traceback, but I need some halp on this (never done this before). some add. info: SELECT relname FROM pg_class WHERE relname like 'pg_%' AND relkind = 'r'; gives: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. SELECT relname FROM pg_class; works well... SELECT relname, relkind from pg_class WHERE relkind='r'; works also... SELECT relname, relkind from pg_class WHERE relname like 'pg_%'; produces the same error as above... > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/users-lounge/docs/faq.html > > > > Cheers! Wim
Wim <wdh@belbone.be> writes: > Tom Lane wrote: >> It would be useful to see a stack traceback from the point of the error, >> if possible. > I Would like to send a stack traceback, but I need some halp on this > (never done this before). > some add. info: > SELECT relname FROM pg_class WHERE relname like 'pg_%' AND relkind = 'r'; > gives: > server closed the connection unexpectedly This should be producing a core file in your database directory ($PGDATA/base/yourdboid/). With gdb you'd do gdb /path/to/postgres-executable /path/to/corefile gdb> bt gdb> quit I don't remember the equivalent incantations with Solaris' debugger. regards, tom lane
On Tue, Sep 17, 2002 at 05:04:23PM +0200, Wim wrote: > > SELECT relname FROM pg_class WHERE relname like 'pg_%' AND relkind = 'r'; > gives: > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > SELECT relname, relkind from pg_class WHERE relname like 'pg_%'; > produces the same error as above... That does rather suggest a memory clobber, as Tom suggested. Wim's 'file postmaster' shows it's a 32-bit binary, though, and I verified in my notes that the snprintf bug was only in the 64-bit library. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Tue, Sep 17, 2002 at 11:08:46AM -0400, Tom Lane wrote: > This should be producing a core file in your database directory > ($PGDATA/base/yourdboid/). With gdb you'd do > gdb /path/to/postgres-executable /path/to/corefile > gdb> bt > gdb> quit > I don't remember the equivalent incantations with Solaris' debugger. I think it's adb /path/to/postgres-executable /path/to/corefile $c [or] $C A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
gdb gives me this: bash-2.05$ adb /usr/local/pgsql/bin/postgres /data/postgres/base/17903709/core core file = /data/postgres/base/17903709/core -- program ``/usr/local/pgsql/bin/postgres'' on platform SUNW,Ultra-60 SIGBUS: Bus Error $c AllocSetAlloc+0x18c(476120, 3, 226018, 0, 0, 13) MemoryContextAlloc+0x68(476120, 3, 70670000, 7efefeff, 81010100, ff00) MemoryContextStrdup+0x28(476120, 4c8568, ffffffff, fffffff8, 0, ffbfe541) make_greater_string+0x1c(4c8568, 13, 297, 4c8730, 0, ffbfe5d1) prefix_selectivity+0xcc(4c0780, 4c7ab8, 4c8568, ffbfe670, ffbfe68b, ffbfe6a0) patternsel+0x278(ffbfe7a0, 0, 476120, 0, 0, ffbfe728) likesel+0x10(ffbfe7a0, ffbfe7a0, 1d7080, fffffff8, 0, ffbfe809) OidFunctionCall4+0x124(71b, 4c0780, 4b7, 4c7b90, 1, ffbfe7e1) restriction_selectivity+0x64(4c0780, 4b7, 4c7b90, 1, 0, ffbfe8a0) clauselist_selectivity+0x164(4c0780, 4c8478, 1, 0, 0, 4c8295) restrictlist_selectivity+0x2c(4c0780, 4c7c18, 1, 0, 0, ff0000) set_baserel_size_estimates+0x2c(4c0780, 4c7d70, ffffffff, fffffff8, 0, 4c83f9) set_plain_rel_pathlist+0x18(4c0780, 4c7d70, 4c0808, 53, 4c0348, 20) set_base_rel_pathlists+0xf8(4c0780, 4c8460, 1, 0, 0, 4c7c00) make_one_rel+0xc(4c0780, 0, ffbfecb8, ffbfecb0, 0, 0) subplanner+0x148(4c0780, 4c7cb8, 0, 0, 0, 0) query_planner+0x98(4c0780, 4c7a48, 0, 0, 0, 0) grouping_planner+0x7cc(4c0780, bff00000, 0, ff13a000, 0, 0) subquery_planner+0x260(4c0780, bff00000, 0, 7efefeff, 81010100, ff0000) planner+0x54(4c0780, 4c1e00, 4c15e0, fffffff8, 0, ffbfffd5) pg_plan_query+0x54(4c0780, 29b99c, 0, 53, 4c0348, 20) pg_exec_query_string+0x388(4c0348, 2, 476010, 4c0330, 800000, 0) PostgresMain+0x1398(5, ffbff2d0, 40c1d9, 473, 0, ffbff1c8) DoBackend+0x7d8(40c0a8, 1, 22730, 1552dc, 0, 40c2d9) BackendStartup+0xb0(40c0a8, 5, ffbff800, ffbff5d8, ffbff658, 0) ServerLoop+0x370(297024, 49a0, 0, 3f1f88, 297004, 2d560000) PostmasterMain+0xbe4(5, 3f2980, 65720000, 0, 65720000, 65720000) main+0x294(5, ffbffd8c, ffbffda4, 3e64c0, 0, 0) _start+0x5c(0, 0, 0, 0, 0, 0) $C ffbfe340 AllocSetAlloc+0x18c(476120, 3, 226018, 0, 0, 13) ffbfe3e0 MemoryContextAlloc+0x68(476120, 3, 70670000, 7efefeff, 81010100, ff00) ffbfe450 MemoryContextStrdup+0x28(476120, 4c8568, ffffffff, fffffff8, 0, ffbfe541) ffbfe4c8 make_greater_string+0x1c(4c8568, 13, 297, 4c8730, 0, ffbfe5d1) ffbfe548 prefix_selectivity+0xcc(4c0780, 4c7ab8, 4c8568, ffbfe670, ffbfe68b, ffbfe6a0) ffbfe5d8 patternsel+0x278(ffbfe7a0, 0, 476120, 0, 0, ffbfe728) ffbfe6b0 likesel+0x10(ffbfe7a0, ffbfe7a0, 1d7080, fffffff8, 0, ffbfe809) ffbfe728 OidFunctionCall4+0x124(71b, 4c0780, 4b7, 4c7b90, 1, ffbfe7e1) ffbfe828 restriction_selectivity+0x64(4c0780, 4b7, 4c7b90, 1, 0, ffbfe8a0) ffbfe8b0 clauselist_selectivity+0x164(4c0780, 4c8478, 1, 0, 0, 4c8295) ffbfe958 restrictlist_selectivity+0x2c(4c0780, 4c7c18, 1, 0, 0, ff0000) ffbfe9d8 set_baserel_size_estimates+0x2c(4c0780, 4c7d70, ffffffff, fffffff8, 0, 4c83f9) ffbfea48 set_plain_rel_pathlist+0x18(4c0780, 4c7d70, 4c0808, 53, 4c0348, 20) ffbfeab8 set_base_rel_pathlists+0xf8(4c0780, 4c8460, 1, 0, 0, 4c7c00) ffbfeb40 make_one_rel+0xc(4c0780, 0, ffbfecb8, ffbfecb0, 0, 0) ffbfebb8 subplanner+0x148(4c0780, 4c7cb8, 0, 0, 0, 0) ffbfec70 query_planner+0x98(4c0780, 4c7a48, 0, 0, 0, 0) ffbfecf8 grouping_planner+0x7cc(4c0780, bff00000, 0, ff13a000, 0, 0) ffbfedc8 subquery_planner+0x260(4c0780, bff00000, 0, 7efefeff, 81010100, ff0000) ffbfee58 planner+0x54(4c0780, 4c1e00, 4c15e0, fffffff8, 0, ffbfffd5) ffbfeed8 pg_plan_query+0x54(4c0780, 29b99c, 0, 53, 4c0348, 20) ffbfef50 pg_exec_query_string+0x388(4c0348, 2, 476010, 4c0330, 800000, 0) ffbff038 PostgresMain+0x1398(5, ffbff2d0, 40c1d9, 473, 0, ffbff1c8) ffbff0f8 DoBackend+0x7d8(40c0a8, 1, 22730, 1552dc, 0, 40c2d9) ffbff4e8 BackendStartup+0xb0(40c0a8, 5, ffbff800, ffbff5d8, ffbff658, 0) ffbff568 ServerLoop+0x370(297024, 49a0, 0, 3f1f88, 297004, 2d560000) ffbff810 PostmasterMain+0xbe4(5, 3f2980, 65720000, 0, 65720000, 65720000) ffbffca0 main+0x294(5, ffbffd8c, ffbffda4, 3e64c0, 0, 0) ffbffd28 _start+0x5c(0, 0, 0, 0, 0, 0) Andrew Sullivan wrote: >On Tue, Sep 17, 2002 at 11:08:46AM -0400, Tom Lane wrote: > > >>This should be producing a core file in your database directory >>($PGDATA/base/yourdboid/). With gdb you'd do >> gdb /path/to/postgres-executable /path/to/corefile >> gdb> bt >> gdb> quit >>I don't remember the equivalent incantations with Solaris' debugger. >> > >I think it's > >adb /path/to/postgres-executable /path/to/corefile >$c > >[or] > >$C > >A > >
Wim <wdh@belbone.be> writes: > gdb gives me this: > bash-2.05$ adb /usr/local/pgsql/bin/postgres > /data/postgres/base/17903709/core > core file = /data/postgres/base/17903709/core -- program > ``/usr/local/pgsql/bin/postgres'' on platform SUNW,Ultra-60 > SIGBUS: Bus Error > $c > AllocSetAlloc+0x18c(476120, 3, 226018, 0, 0, 13) > MemoryContextAlloc+0x68(476120, 3, 70670000, 7efefeff, 81010100, ff00) > MemoryContextStrdup+0x28(476120, 4c8568, ffffffff, fffffff8, 0, ffbfe541) > make_greater_string+0x1c(4c8568, 13, 297, 4c8730, 0, ffbfe5d1) > prefix_selectivity+0xcc(4c0780, 4c7ab8, 4c8568, ffbfe670, ffbfe68b, > ffbfe6a0) > patternsel+0x278(ffbfe7a0, 0, 476120, 0, 0, ffbfe728) > likesel+0x10(ffbfe7a0, ffbfe7a0, 1d7080, fffffff8, 0, ffbfe809) Hm. Are you running in a multibyte character encoding? I had a note that make_greater_string may have problems in the MULTIBYTE case. regards, tom lane
Tom Lane wrote: >Wim <wdh@belbone.be> writes: > > >>gdb gives me this: >>bash-2.05$ adb /usr/local/pgsql/bin/postgres >>/data/postgres/base/17903709/core >>core file = /data/postgres/base/17903709/core -- program >>``/usr/local/pgsql/bin/postgres'' on platform SUNW,Ultra-60 >>SIGBUS: Bus Error >>$c >>AllocSetAlloc+0x18c(476120, 3, 226018, 0, 0, 13) >>MemoryContextAlloc+0x68(476120, 3, 70670000, 7efefeff, 81010100, ff00) >>MemoryContextStrdup+0x28(476120, 4c8568, ffffffff, fffffff8, 0, ffbfe541) >>make_greater_string+0x1c(4c8568, 13, 297, 4c8730, 0, ffbfe5d1) >>prefix_selectivity+0xcc(4c0780, 4c7ab8, 4c8568, ffbfe670, ffbfe68b, >>ffbfe6a0) >>patternsel+0x278(ffbfe7a0, 0, 476120, 0, 0, ffbfe728) >>likesel+0x10(ffbfe7a0, ffbfe7a0, 1d7080, fffffff8, 0, ffbfe809) >> >> > >Hm. Are you running in a multibyte character encoding? I had a note >that make_greater_string may have problems in the MULTIBYTE case. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > Yes, I compiled postgres with multibyte and ODBC support . Is there a workaround possibel to fix my problem? Cheers! Wim
Wim <wdh@belbone.be> writes: >> Hm. Are you running in a multibyte character encoding? I had a note >> that make_greater_string may have problems in the MULTIBYTE case. > Yes, I compiled postgres with multibyte and ODBC support . But are you actually *using* the multibyte code? What does "psql -l" show as the encoding for your database? regards, tom lane
Tom Lane wrote: >Wim <wdh@belbone.be> writes: > >>>Hm. Are you running in a multibyte character encoding? I had a note >>>that make_greater_string may have problems in the MULTIBYTE case. >>> > >>Yes, I compiled postgres with multibyte and ODBC support . >> > >But are you actually *using* the multibyte code? What does "psql -l" >show as the encoding for your database? > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/users-lounge/docs/faq.html > > > psql -l shows: List of databases Name | Owner | Encoding ---------------+----------+----------- addressIP | postgres | SQL_ASCII belbonedb_v2 | postgres | SQL_ASCII belbonedb_v21 | postgres | SQL_ASCII peering | postgres | SQL_ASCII peering_v2 | postgres | SQL_ASCII postgres | postgres | SQL_ASCII smsbilling | postgres | SQL_ASCII template0 | postgres | SQL_ASCII template1 | postgres | SQL_ASCII (9 rows) Maybe I should recompile Postgres without multibyte support... Cheers! Wim