Обсуждение: (don't know who else to tell) 6.5 gets build on LinuxPPCR5 but fails a lot of regr. tests

Поиск
Список
Период
Сортировка
it gets build 
I can initdb
I can createdb, but not destroydb

a lot of "typidTypeRelid"errors

I'm in a hurry right now, should I tell anyone else? post bug report?




>it gets build 
>I can initdb
>I can createdb, but not destroydb
>
>a lot of "typidTypeRelid"errors
>
>I'm in a hurry right now, should I tell anyone else? post bug report?

It's a known problem with LinuxPPC R5 + PostgreSQL. Try re-compile
along with disabling -O2 flag.
--
Tatsuo Ishii


> >it gets build 
> >I can initdb
> >I can createdb, but not destroydb
> >
> >a lot of "typidTypeRelid"errors
> >
> >I'm in a hurry right now, should I tell anyone else? post bug report?
> 
> It's a known problem with LinuxPPC R5 + PostgreSQL. Try re-compile
> along with disabling -O2 flag.

I did't realize our template only changed -O2 to -O for linux_alpha. 
Added for linux_ppc too.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


>> >it gets build 
>> >I can initdb
>> >I can createdb, but not destroydb
>> >
>> >a lot of "typidTypeRelid"errors
>> >
>> >I'm in a hurry right now, should I tell anyone else? post bug report?
>> 
>> It's a known problem with LinuxPPC R5 + PostgreSQL. Try re-compile
>> along with disabling -O2 flag.
>
>I did't realize our template only changed -O2 to -O for linux_alpha. 
>Added for linux_ppc too.

Don't be hurry:-) Disabling -O2 is just my guess(I don't have R5
yet). I think his problem is related to the one reported by the
LinuxPPC development team. If this is the case, -O is not enough, -O0
should be used instead. Also note that the problem would not occur for 
LinuxPPC R4(I guess this is due to the difference of compilers).
Anyway, true fix would be as suggested in the mail (can't be fixed
till 6.6?).
--
Tatsuo Ishii

--------------------------------------------------------------------
Date: Fri, 14 May 1999 14:50:58 -0400
From: Jack Howarth <howarth@nitro.med.uc.edu>
To: scrappy@hub.org
Subject: postgresql bug report

Marc,     In porting the RedHat 6.0 srpm set for a linuxppc release we
believe a bug has been identified in
the postgresql source for 6.5-0.beta1. Our development tools are as
follows...

glibc 2.1.1 pre 2
linux 2.2.6
egcs 1.1.2
the latest binutils snapshot

The bug that we see is that when egcs compiles postgresql at -O1 or
higher (-O0 is fine),
postgresql creates incorrectly formed databases such that when the
user
does a destroydb
the database can not be destroyed. Franz Sirl has identified the
problem
as follows...
   it seems that this problem is a type casting/promotion bug in the
source. The   routine _bt_checkkeys() in backend/access/nbtree/nbtutils.c calls
int2eq() in   backend/utils/adt/int.c via a function pointer
*fmgr_faddr(&key[0].sk_func). As   the type information for int2eq is lost via the function pointer,
the compiler   passes 2 ints, but int2eq expects 2 (preformatted in a 32bit reg)
int16's.   This particular bug goes away, if I for example change int2eq to:
   bool   int2eq(int32 arg1, int32 arg2)   {           return (int16)arg1 == (int16)arg2;   }
   This moves away the type casting/promotion "work" from caller to
the
callee and   is probably the right thing to do for functions used via function
pointers.

...because of the large number of changes required to do this, Franz
thought we should
pass this on to the postgresql maintainers for correction. Please feel
free to contact
Franz Sirl (Franz.Sirl-kernel@lauterbach.com) if you have any
questions
on this bug
report.

--
------------------------------------------------------------------------------
Jack W. Howarth, Ph.D.                                     231
Bethesda Avenue
NMR Facility Director                              Cincinnati, Ohio
45267-0524
Dept. of Molecular Genetics                              phone: (513)
558-4420
Univ. of Cincinnati College of Medicine                    fax: (513)
558-8474


Someone please let me know of -O0 or -O take care of the problem.


> >> >it gets build 
> >> >I can initdb
> >> >I can createdb, but not destroydb
> >> >
> >> >a lot of "typidTypeRelid"errors
> >> >
> >> >I'm in a hurry right now, should I tell anyone else? post bug report?
> >> 
> >> It's a known problem with LinuxPPC R5 + PostgreSQL. Try re-compile
> >> along with disabling -O2 flag.
> >
> >I did't realize our template only changed -O2 to -O for linux_alpha. 
> >Added for linux_ppc too.
> 
> Don't be hurry:-) Disabling -O2 is just my guess(I don't have R5
> yet). I think his problem is related to the one reported by the
> LinuxPPC development team. If this is the case, -O is not enough, -O0
> should be used instead. Also note that the problem would not occur for 
> LinuxPPC R4(I guess this is due to the difference of compilers).
> Anyway, true fix would be as suggested in the mail (can't be fixed
> till 6.6?).
> --
> Tatsuo Ishii
> 
> --------------------------------------------------------------------
> Date: Fri, 14 May 1999 14:50:58 -0400
> From: Jack Howarth <howarth@nitro.med.uc.edu>
> To: scrappy@hub.org
> Subject: postgresql bug report
> 
> Marc,
>       In porting the RedHat 6.0 srpm set for a linuxppc release we
> believe a bug has been identified in
> the postgresql source for 6.5-0.beta1. Our development tools are as
> follows...
> 
> glibc 2.1.1 pre 2
> linux 2.2.6
> egcs 1.1.2
> the latest binutils snapshot
> 
> The bug that we see is that when egcs compiles postgresql at -O1 or
> higher (-O0 is fine),
> postgresql creates incorrectly formed databases such that when the
> user
> does a destroydb
> the database can not be destroyed. Franz Sirl has identified the
> problem
> as follows...
> 
>     it seems that this problem is a type casting/promotion bug in the
> source. The
>     routine _bt_checkkeys() in backend/access/nbtree/nbtutils.c calls
> int2eq() in
>     backend/utils/adt/int.c via a function pointer
> *fmgr_faddr(&key[0].sk_func). As
>     the type information for int2eq is lost via the function pointer,
> the compiler
>     passes 2 ints, but int2eq expects 2 (preformatted in a 32bit reg)
> int16's.
>     This particular bug goes away, if I for example change int2eq to:
> 
>     bool
>     int2eq(int32 arg1, int32 arg2)
>     {
>             return (int16)arg1 == (int16)arg2;
>     }
> 
>     This moves away the type casting/promotion "work" from caller to
> the
> callee and
>     is probably the right thing to do for functions used via function
> pointers.
> 
> ...because of the large number of changes required to do this, Franz
> thought we should
> pass this on to the postgresql maintainers for correction. Please feel
> free to contact
> Franz Sirl (Franz.Sirl-kernel@lauterbach.com) if you have any
> questions
> on this bug
> report.
> 
> --
> ------------------------------------------------------------------------------
> Jack W. Howarth, Ph.D.                                     231
> Bethesda Avenue
> NMR Facility Director                              Cincinnati, Ohio
> 45267-0524
> Dept. of Molecular Genetics                              phone: (513)
> 558-4420
> Univ. of Cincinnati College of Medicine                    fax: (513)
> 558-8474
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


At 11:00 17-6-99 -0400, Bruce Momjian wrote:
>Someone please let me know of -O0 or -O take care of the problem.

-O0 is good

-O is NOT good
( and just to make sure -O1 is NOT good either )




> At 11:00 17-6-99 -0400, Bruce Momjian wrote:
> >Someone please let me know of -O0 or -O take care of the problem.
> 
> -O0 is good
> 
> -O is NOT good
> ( and just to make sure -O1 is NOT good either )
> 
> 
> 

OK, should I change the template for linux_ppc to -O0?


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


At 19:15 17-6-99 -0400, Bruce Momjian wrote:
>> At 11:00 17-6-99 -0400, Bruce Momjian wrote:
>> >Someone please let me know of -O0 or -O take care of the problem.
>> 
>> -O0 is good
>> 
>> -O is NOT good
>> ( and just to make sure -O1 is NOT good either )
>
>OK, should I change the template for linux_ppc to -O0?

I'm in way over my head here, don't know anything about C, don't know the
source code of postgres, so don't listen to me.
( I just thought last night to try and see if I could get LinuxPPCR5 to run
on my Motorola Starmax and when that was done I thought to try and build
postgres on it, just for fun)

how bad is it that -O2 will not work? LinuxPPCR5 probably is not one of the
main platforms postgres is running on.
If not being able to -O2 the compile is really bad for perfomance a note in
the INSTALL would be in order to let people know that running on LinuxPPCR5
is not going to be a fast ride, and that the postgres dev team is aware of
the problem and that is being worked on :)

and yes change the template for linux_ppc to -O0



> At 19:15 17-6-99 -0400, Bruce Momjian wrote:
> >> At 11:00 17-6-99 -0400, Bruce Momjian wrote:
> >> >Someone please let me know of -O0 or -O take care of the problem.
> >> 
> >> -O0 is good
> >> 
> >> -O is NOT good
> >> ( and just to make sure -O1 is NOT good either )
> >
> >OK, should I change the template for linux_ppc to -O0?
> 
> I'm in way over my head here, don't know anything about C, don't know the
> source code of postgres, so don't listen to me.
> ( I just thought last night to try and see if I could get LinuxPPCR5 to run
> on my Motorola Starmax and when that was done I thought to try and build
> postgres on it, just for fun)
> 
> how bad is it that -O2 will not work? LinuxPPCR5 probably is not one of the
> main platforms postgres is running on.
> If not being able to -O2 the compile is really bad for perfomance a note in
> the INSTALL would be in order to let people know that running on LinuxPPCR5
> is not going to be a fast ride, and that the postgres dev team is aware of
> the problem and that is being worked on :)
> 
> and yes change the template for linux_ppc to -O0
> 
> 

Done.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


> OK, should I change the template for linux_ppc to -O0?

Not all linux_ppc box is suffered by the problem actually, so it might
be over kill. However, it should definitely stop complains from
LinuxPPC R5 users, and I have to admit it seems the best solution for
a short term.

But for the long term, we have to repair our codes. See the posting
from you below.

P.S.    I don't see your addition to the TODO in the 6.5 source tree.
--
Tatsuo ishii

To: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat, 15 May 1999 05:10:51 -0400 (EDT)
CC: The Hermit Hacker <scrappy@hub.org>, pgsql-hackers@postgreSQL.org,       Jack Howarth <howarth@nitro.med.uc.edu>
X-Mailer: ELM [version 2.4ME+ PL56 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
X-UIDL: bf5d0cf38a9d14744994d06f92566c16

> The Hermit Hacker <scrappy@hub.org> writes:
> >     it seems that this problem is a type casting/promotion bug in the
> > source. The
> >     routine _bt_checkkeys() in backend/access/nbtree/nbtutils.c calls
> > int2eq() in
> >     backend/utils/adt/int.c via a function pointer
> > *fmgr_faddr(&key[0].sk_func). As
> >     the type information for int2eq is lost via the function pointer,
> > the compiler
> >     passes 2 ints, but int2eq expects 2 (preformatted in a 32bit reg)
> > int16's.
> >     This particular bug goes away, if I for example change int2eq to:
> 
> >     bool
> >     int2eq(int32 arg1, int32 arg2)
> >     {
> >             return (int16)arg1 == (int16)arg2;
> >     }
> 
> Yow.  I can't believe that we haven't seen this failure before on a
> variety of platforms.  Calling an ANSI-style function that has char or
> short args is undefined behavior if you call it without benefit of a
> prototype, because the parameter layout is allowed to be different.
> Apparently, fewer compilers exploit that freedom than I would've thought.
> 
> Really, *all* of the builtin-function routines ought to take arguments
> of type Datum and then do the appropriate Get() macro to extract what
> they want from 'em.  That's a depressingly large amount of work, but
> at the very least the functions that take bool and int16 have to be
> changed...

I concur in your Yow.  Lots of changes, and I am surprised we have not
been bitten by this before.  Added to TODO:
Fix function pointer calls to take Datum args for char and int2 args

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 



> > OK, should I change the template for linux_ppc to -O0?
> 
> Not all linux_ppc box is suffered by the problem actually, so it might
> be over kill. However, it should definitely stop complains from
> LinuxPPC R5 users, and I have to admit it seems the best solution for
> a short term.
> 
> But for the long term, we have to repair our codes. See the posting
> from you below.
> 
> P.S.    I don't see your addition to the TODO in the 6.5 source tree.

Added:
 * Fix C optimizer problem where fmgr_ptr calls return different types

I think I removed it because we didn't think it was a problem at one
point.  Now we know it is.  Good target for 6.6.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


> how bad is it that -O2 will not work? LinuxPPCR5 probably is not one of the
> main platforms postgres is running on.
> If not being able to -O2 the compile is really bad for perfomance a note in
> the INSTALL would be in order to let people know that running on LinuxPPCR5
> is not going to be a fast ride, and that the postgres dev team is aware of
> the problem and that is being worked on :)

My vague recollection is that for other platforms (Alpha, i686) -O2 vs
-O0 is a 30% kind of improvement on typical code (I've not measured
this for Postgres). Of course, some sample code which is dominated by
tight loops with unfortunate style might show much bigger improvement,
but to say the least Postgres probably isn't in that category.

So it really isn't *that* big a deal until you get to large DBs or
large loading.
                            - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California