Обсуждение: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge
Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo builfarm member. Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/d5321842528dfb73f8254a48556b4adb1b6d1c5a Modified Files -------------- contrib/cube/cubedata.h | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-)
Bruce Momjian <bruce@momjian.us> writes: > Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo > builfarm member. mongoose is still crashing, so it must have been some other aspect of commit 4bd7333 that caused this. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo > > builfarm member. > > mongoose is still crashing, so it must have been some other aspect of > commit 4bd7333 that caused this. Agreed. Let me look some more. Odd this succeeds: okapi Gentoo 1.12.14 icc 11.1.072 x86_64 but this fails: mongoose Gentoo 1.6.14 icc 9.0.032 i686 The backtrace: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2011-09-01%2013%3A45%3A01 shows it failing on this line: size = offsetof(NDBOX, x[0]) +sizeof(double) * 2; so I wonder if this is some compiler bug. offsetof is: ((long) &((type *)0)->field) and the struct is: typedef struct NDBOX { int32 vl_len_; /* varlena header (do not touch directly!) */ unsigned int dim; double x[1]; } NDBOX; That "x" is quite a common symbol. Is there any way to get access to this machine? Should I just revert it all and see what happens? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, 1 Sep 2011, Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo > > > builfarm member. > > > > mongoose is still crashing, so it must have been some other aspect of > > commit 4bd7333 that caused this. > > Agreed. Let me look some more. Odd this succeeds: > > okapi Gentoo 1.12.14 icc 11.1.072 x86_64 > > but this fails: > > mongoose Gentoo 1.6.14 icc 9.0.032 i686 > > The backtrace: > > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2011-09-01%2013%3A45%3A01 > > shows it failing on this line: > > size = offsetof(NDBOX, x[0]) +sizeof(double) * 2; > > so I wonder if this is some compiler bug. offsetof is: > > ((long) &((type *)0)->field) > > and the struct is: > > typedef struct NDBOX > { > int32 vl_len_; /* varlena header (do not touch directly!) */ > unsigned int dim; > double x[1]; > } NDBOX; > > That "x" is quite a common symbol. Is there any way to get access to > this machine? Should I just revert it all and see what happens? > > I am the owner of both mongoose and okapi. Let me know if there's anything you want me to try.
Jeremy Drake wrote: > On Thu, 1 Sep 2011, Bruce Momjian wrote: > > > Tom Lane wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > > > Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo > > > > builfarm member. > > > > > > mongoose is still crashing, so it must have been some other aspect of > > > commit 4bd7333 that caused this. > > > > Agreed. Let me look some more. Odd this succeeds: > > > > okapi Gentoo 1.12.14 icc 11.1.072 x86_64 > > > > but this fails: > > > > mongoose Gentoo 1.6.14 icc 9.0.032 i686 > > > > The backtrace: > > > > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2011-09-01%2013%3A45%3A01 > > > > shows it failing on this line: > > > > size = offsetof(NDBOX, x[0]) +sizeof(double) * 2; > > > > so I wonder if this is some compiler bug. offsetof is: > > > > ((long) &((type *)0)->field) > > > > and the struct is: > > > > typedef struct NDBOX > > { > > int32 vl_len_; /* varlena header (do not touch directly!) */ > > unsigned int dim; > > double x[1]; > > } NDBOX; > > > > That "x" is quite a common symbol. Is there any way to get access to > > this machine? Should I just revert it all and see what happens? > > > > > > I am the owner of both mongoose and okapi. Let me know if there's > anything you want me to try. Thanks. I would either like to email you patches to test or get ssh access so I can compile it myself. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, 1 Sep 2011, Bruce Momjian wrote: > Jeremy Drake wrote: > > > I am the owner of both mongoose and okapi. Let me know if there's > > anything you want me to try. > > Thanks. I would either like to email you patches to test or get ssh > access so I can compile it myself. You can send me patches if you want, but I spent a little time with it tonight and it seems to be the change to src/include/access/xlog.h: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/include/access/xlog.h;h=1fd60fb98d7362b677112517a20a41e32227a34f;hp=cdbf63fa76e0e7b154c084191d0df6138e1cbfcc;hb=4bd7333;hpb=d010391ac8f706e17998671534ca1230f68d2f38 Unfortunately, I also had to revert commit 6416a82a62db4e66b2edb0fa8fc83a580c3f1931 to fix compile errors. I expect you would be able to do something a little more surgical than that...
Jeremy Drake wrote: > On Thu, 1 Sep 2011, Bruce Momjian wrote: > > > Jeremy Drake wrote: > > > > > I am the owner of both mongoose and okapi. Let me know if there's > > > anything you want me to try. > > > > Thanks. I would either like to email you patches to test or get ssh > > access so I can compile it myself. > > You can send me patches if you want, but I spent a little time with it > tonight and it seems to be the change to src/include/access/xlog.h: > http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/include/access/xlog.h;h=1fd60fb98d7362b677112517a20a41e32227a34f;hp=cdbf63fa76e0e7b154c084191d0df6138e1cbfcc;hb=4bd7333;hpb=d010391ac8f706e17998671534ca1230f68d2f38 > > Unfortunately, I also had to revert commit > 6416a82a62db4e66b2edb0fa8fc83a580c3f1931 to fix compile errors. I expect > you would be able to do something a little more surgical than that... Wow, that is interesting. So the problem is the inclusion of replication/walsender.h in xlog.h. Hard to see how that could cause the cube regression tests to fail, but of course, it is. I noticed you are using these compile flags: 'CFLAGS' => '-O3 -xN -parallel -ip', 'CC' => 'icc' Can you test it with lower optimizations? I looked at the contrib/cube compile messages and didn't see anything unusual. The only other idea I have is to try the attached patch which changes the offsetof() call to mention a struct field name, and not the first element of the field. However, I see other uses of accessing the element of a struct field, so I might be wrong here. I will say that our buildfarm is great at giving developers information to diagnose the cause of failures! It just isn't helping me find the cause in this particular case. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c new file mode 100644 index b0564f7..5159be4 *** a/contrib/cube/cube.c --- b/contrib/cube/cube.c *************** cube_a_f8_f8(PG_FUNCTION_ARGS) *** 199,205 **** dur = ARRPTR(ur); dll = ARRPTR(ll); ! size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = dim; --- 199,205 ---- dur = ARRPTR(ur); dll = ARRPTR(ll); ! size = offsetof(NDBOX, x) +sizeof(double) * 2 * dim; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = dim; *************** cube_a_f8(PG_FUNCTION_ARGS) *** 235,241 **** dur = ARRPTR(ur); ! size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = dim; --- 235,241 ---- dur = ARRPTR(ur); ! size = offsetof(NDBOX, x) +sizeof(double) * 2 * dim; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = dim; *************** cube_subset(PG_FUNCTION_ARGS) *** 268,274 **** dx = (int4 *) ARR_DATA_PTR(idx); dim = ARRNELEMS(idx); ! size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = dim; --- 268,274 ---- dx = (int4 *) ARR_DATA_PTR(idx); dim = ARRNELEMS(idx); ! size = offsetof(NDBOX, x) +sizeof(double) * 2 * dim; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = dim; *************** cube_enlarge(PG_FUNCTION_ARGS) *** 1373,1379 **** dim = n; if (a->dim > dim) dim = a->dim; ! size = offsetof(NDBOX, x[0]) +sizeof(double) * dim * 2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = dim; --- 1373,1379 ---- dim = n; if (a->dim > dim) dim = a->dim; ! size = offsetof(NDBOX, x) +sizeof(double) * dim * 2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = dim; *************** cube_f8(PG_FUNCTION_ARGS) *** 1414,1420 **** NDBOX *result; int size; ! size = offsetof(NDBOX, x[0]) +sizeof(double) * 2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = 1; --- 1414,1420 ---- NDBOX *result; int size; ! size = offsetof(NDBOX, x) +sizeof(double) * 2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = 1; *************** cube_f8_f8(PG_FUNCTION_ARGS) *** 1432,1438 **** NDBOX *result; int size; ! size = offsetof(NDBOX, x[0]) +sizeof(double) * 2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = 1; --- 1432,1438 ---- NDBOX *result; int size; ! size = offsetof(NDBOX, x) +sizeof(double) * 2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = 1; *************** cube_c_f8(PG_FUNCTION_ARGS) *** 1453,1459 **** int size; int i; ! size = offsetof(NDBOX, x[0]) +sizeof(double) * (c->dim + 1) *2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = c->dim + 1; --- 1453,1459 ---- int size; int i; ! size = offsetof(NDBOX, x) +sizeof(double) * (c->dim + 1) *2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = c->dim + 1; *************** cube_c_f8_f8(PG_FUNCTION_ARGS) *** 1480,1486 **** int size; int i; ! size = offsetof(NDBOX, x[0]) +sizeof(double) * (c->dim + 1) *2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = c->dim + 1; --- 1480,1486 ---- int size; int i; ! size = offsetof(NDBOX, x) +sizeof(double) * (c->dim + 1) *2; result = (NDBOX *) palloc0(size); SET_VARSIZE(result, size); result->dim = c->dim + 1;
Excerpts from Bruce Momjian's message of vie sep 02 12:20:50 -0300 2011: > The only other idea I have is to try the attached patch which changes > the offsetof() call to mention a struct field name, and not the first > element of the field. However, I see other uses of accessing the > element of a struct field, so I might be wrong here. I wonder if this would be the right time to start using the FLEXIBLE_ARRAY_MEMBER stuff in contrib/cube. Note pg_config.h.in says /* Define to nothing if C supports flexible array members, and to 1 if it does not. That way, with a declaration like `struct s { int n; double d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99 compilers. When computing the size of such an object, don't use 'sizeof (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)' instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with MSVC and with C++ compilers. */ #undef FLEXIBLE_ARRAY_MEMBER -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > I wonder if this would be the right time to start using the > FLEXIBLE_ARRAY_MEMBER stuff in contrib/cube. Note pg_config.h.in says > /* Define to nothing if C supports flexible array members, and to 1 if it does > not. That way, with a declaration like `struct s { int n; double > d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99 > compilers. When computing the size of such an object, don't use 'sizeof > (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)' > instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with > MSVC and with C++ compilers. */ D'oh ... I bet that last sentence is pointing us at the problem. cube is using exactly that construct, and for some reason it's crashing. The most likely explanation for why it's crashing is that the compiler is trying to dereference NULL instead of successfully reducing offsetof() to a compile-time constant. It's still not too clear to me why the inclusion changes cause that, but certainly walsender.h is pulling in a crapload of other stuff that was not previously included there ... which connects back to my previous complaints that I think Bruce was way too aggressive in adding #includes to headers. Jeremy, could you look at the preprocessor output for cube.c (ie, use -E instead of -c in the gcc call) and see how the relevant line of cube_f8_f8 looks in both broken and non-broken cases? What I see on a Fedora box is size = __builtin_offsetof (NDBOX, x[0]) +sizeof(double) * 2; but I'm thinking you might be getting something different. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I wonder if this would be the right time to start using the > > FLEXIBLE_ARRAY_MEMBER stuff in contrib/cube. Note pg_config.h.in says > > > /* Define to nothing if C supports flexible array members, and to 1 if it does > > not. That way, with a declaration like `struct s { int n; double > > d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99 > > compilers. When computing the size of such an object, don't use 'sizeof > > (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)' > > instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with > > MSVC and with C++ compilers. */ > > D'oh ... I bet that last sentence is pointing us at the problem. cube > is using exactly that construct, and for some reason it's crashing. > The most likely explanation for why it's crashing is that the compiler > is trying to dereference NULL instead of successfully reducing > offsetof() to a compile-time constant. It's still not too clear to me > why the inclusion changes cause that, but certainly walsender.h is > pulling in a crapload of other stuff that was not previously included > there ... which connects back to my previous complaints that I think > Bruce was way too aggressive in adding #includes to headers. > > Jeremy, could you look at the preprocessor output for cube.c (ie, > use -E instead of -c in the gcc call) and see how the relevant line > of cube_f8_f8 looks in both broken and non-broken cases? What I see > on a Fedora box is > > size = __builtin_offsetof (NDBOX, x[0]) +sizeof(double) * 2; > > but I'm thinking you might be getting something different. I see 35 instances of this coding, and only 12 are in contrib/cube; examples attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cubeparse.y:178: int size = offsetof(NDBOX, x[0]) + sizeof(double)* dim * 2; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cubeparse.y:210: size = offsetof(NDBOX, x[0]) + sizeof(double) * dim* 2; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cube.c:202: size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cube.c:238: size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cube.c:271: size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cube.c:1376: size = offsetof(NDBOX, x[0]) +sizeof(double) * dim *2; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cube.c:1417: size = offsetof(NDBOX, x[0]) +sizeof(double) * 2; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cube.c:1435: size = offsetof(NDBOX, x[0]) +sizeof(double) * 2; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cube.c:1456: size = offsetof(NDBOX, x[0]) +sizeof(double) * (c->dim+ 1) *2; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cube.c:1483: size = offsetof(NDBOX, x[0]) +sizeof(double) * (c->dim+ 1) *2; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cubeparse.c:1369: int size = offsetof(NDBOX, x[0]) + sizeof(double)* dim * 2; /usr/var/local/src/gen/pgsql/postgresql/contrib/cube/cubeparse.c:1401: size = offsetof(NDBOX, x[0]) + sizeof(double) * dim* 2; /usr/var/local/src/gen/pgsql/postgresql/src/backend/postmaster/pgstat.c:794: len = offsetof(PgStat_MsgTabstat, m_entry[0])+ /usr/var/local/src/gen/pgsql/postgresql/src/backend/postmaster/pgstat.c:840: pgstat_send(&msg, offsetof(PgStat_MsgFuncstat,m_entry[0]) + /usr/var/local/src/gen/pgsql/postgresql/src/backend/postmaster/pgstat.c:850: pgstat_send(&msg, offsetof(PgStat_MsgFuncstat,m_entry[0]) + /usr/var/local/src/gen/pgsql/postgresql/src/backend/postmaster/pgstat.c:951: len = offsetof(PgStat_MsgTabpurge,m_tableid[0]) /usr/var/local/src/gen/pgsql/postgresql/src/backend/postmaster/pgstat.c:967: len = offsetof(PgStat_MsgTabpurge, m_tableid[0]) /usr/var/local/src/gen/pgsql/postgresql/src/backend/postmaster/pgstat.c:1011: len = offsetof(PgStat_MsgFuncpurge,m_functionid[0]) /usr/var/local/src/gen/pgsql/postgresql/src/backend/postmaster/pgstat.c:1025: len = offsetof(PgStat_MsgFuncpurge,m_functionid[0]) /usr/var/local/src/gen/pgsql/postgresql/src/backend/postmaster/pgstat.c:1127: len = offsetof(PgStat_MsgTabpurge, m_tableid[0])+sizeof(Oid); /usr/var/local/src/gen/pgsql/postgresql/src/backend/access/heap/syncscan.c:109:#define SizeOfScanLocations(N) offsetof(ss_scan_locations_t,items[N]) /usr/var/local/src/gen/pgsql/postgresql/src/backend/access/nbtree/nbtree.c:535: offsetof(BTScanPosData,items[1]) + /usr/var/local/src/gen/pgsql/postgresql/src/backend/access/nbtree/nbtsearch.c:1120: offsetof(BTScanPosData,items[1]) + /usr/var/local/src/gen/pgsql/postgresql/src/backend/access/nbtree/nbtutils.c:1450: size = offsetof(BTVacInfo, vacuums[0]); /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:1424: size = offsetof(PATH, p[0]) +sizeof(path->p[0])* npts; /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:1470: if (npts <= 0 || npts >= (int32) ((INT_MAX- offsetof(PATH, p[0])) / sizeof(Point))) /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:1475: size = offsetof(PATH, p[0]) +sizeof(path->p[0])* npts; /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:3476: size = offsetof(POLYGON, p[0]) +sizeof(poly->p[0])* npts; /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:3523: if (npts <= 0 || npts >= (int32) ((INT_MAX- offsetof(POLYGON, p[0])) / sizeof(Point))) /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:3528: size = offsetof(POLYGON, p[0]) +sizeof(poly->p[0])* npts; /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:4244: size = offsetof(PATH, p[0]) +base_size; /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:4382: size = offsetof(POLYGON, p[0]) +sizeof(poly->p[0])* path->npts; /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:4457: size = offsetof(POLYGON, p[0]) +sizeof(poly->p[0])* 4; /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:4487: size = offsetof(PATH, p[0]) +sizeof(path->p[0])* poly->npts; /usr/var/local/src/gen/pgsql/postgresql/src/backend/utils/adt/geo_ops.c:5166: size = offsetof(POLYGON, p[0]) +base_size;
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Alvaro Herrera <alvherre@commandprompt.com> writes: >>> instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with >>> MSVC and with C++ compilers. */ >> D'oh ... I bet that last sentence is pointing us at the problem. cube >> is using exactly that construct, and for some reason it's crashing. > I see 35 instances of this coding, and only 12 are in contrib/cube; > examples attached. Yeah, so the next question would be why those other ones aren't showing problems. But at least now we have a potential mechanism for getting from "the include list changed" to "cube is crashing on an offsetof", namely that something is affecting the expansion of the offsetof macro. Up to now it's been black magic, and I don't like patching around problems we don't understand any better than that. regards, tom lane
On Fri, 2 Sep 2011, Tom Lane wrote: > Yeah, so the next question would be why those other ones aren't showing > problems. But at least now we have a potential mechanism for getting > from "the include list changed" to "cube is crashing on an offsetof", > namely that something is affecting the expansion of the offsetof macro. > Up to now it's been black magic, and I don't like patching around > problems we don't understand any better than that. I'm going to try answering about 3 emails at once here, so I apologize for the confusion. Checking preprocessor output on cube_f8_f8 between working and broken, the output was identical for that function. Changing offsetof(NDBOX, x[0]) to offsetof(NDBOX, x), no change in behavior. Then I started investigating the disassembly. The PIC disassembly was making my head hurt (why is the compiler calling the next instruction, popping the return address into a register, and using it as a pointer? Oh yeah, PIC, duh!), but I'm pretty sure that what it crashed on was attempting to access the global external variable CurrentMemoryContext. The odd thing is, that the disassembly code between the working and non-working was the same, except for the offsets. Here's the disassembly output from the core dump: Program terminated with signal 11, Segmentation fault. #0 cube_f8_f8 () at cube.c:1435 1435 size = offsetof(NDBOX, x) +sizeof(double) * 2; (gdb) set disassembly-flavor intel (gdb) disass Dump of assembler code for function cube_f8_f8: 0xb776ab50 <+0>: push ebp 0xb776ab51 <+1>: mov ebp,esp 0xb776ab53 <+3>: and esp,0xfffffff8 0xb776ab56 <+6>: push ebx 0xb776ab57 <+7>: sub esp,0x14 0xb776ab5a <+10>: mov ecx,DWORD PTR [ebp+0x8] 0xb776ab5d <+13>: mov DWORD PTR [esp+0x10],ebx 0xb776ab61 <+17>: mov edx,DWORD PTR [ecx+0x14] 0xb776ab64 <+20>: movsd xmm0,QWORD PTR [edx] 0xb776ab68 <+24>: mov edx,DWORD PTR [ecx+0x18] 0xb776ab6b <+27>: movsd xmm1,QWORD PTR [edx] 0xb776ab6f <+31>: movsd QWORD PTR [esp],xmm0 0xb776ab74 <+36>: movsd QWORD PTR [esp+0x8],xmm1 0xb776ab7a <+42>: push 0x18 0xb776ab7c <+44>: call 0xb776ab81 <cube_f8_f8+49> 0xb776ab81 <+49>: pop eax 0xb776ab82 <+50>: add eax,0x9472 0xb776ab87 <+55>: mov edx,DWORD PTR [eax-0x58] => 0xb776ab8d <+61>: push DWORD PTR [edx] 0xb776ab8f <+63>: mov DWORD PTR [esp+0x18],ebx 0xb776ab93 <+67>: mov ebx,eax 0xb776ab95 <+69>: call 0xb776745c <MemoryContextAllocZero@plt> With this knowledge, I thought maybe cube wasn't the actual problem, but just happened to be early in the list of contrib modules being tested. So I arbitrarily picked another contrib module to test, intarray. ============== running regression test queries ============== test _int ... FAILED (test process exited with exit code 2) Program terminated with signal 11, Segmentation fault. #0 0xb785a1fa in g_intbig_union () from /buildfarm/test/pgsql_test/contrib/intarray/tmp_check/install/usr/local/pgsql/lib/_int.so (gdb) set disassembly-flavor intel (gdb) disass Dump of assembler code for function g_intbig_union: <SNIP UNRELATED ASSEMBLY CODE> 0xb785a087 <+31>: call 0xb785a08c <g_intbig_union+36> 0xb785a08c <+36>: pop eax 0xb785a08d <+37>: add eax,0x6f67 0xb785a092 <+42>: mov DWORD PTR [esp+0x104],eax <SNIP MORE UNRELATED ASSEMBLY CODE (this is a much longer function)> 0xb785a1e5 <+381>: mov eax,DWORD PTR [esp+0x104] 0xb785a1ec <+388>: mov esi,DWORD PTR [eax-0x24] 0xb785a1f2 <+394>: mov DWORD PTR [esp+0x108],ebx 0xb785a1f9 <+401>: push ebx => 0xb785a1fa <+402>: push DWORD PTR [esi] 0xb785a1fc <+404>: mov DWORD PTR [esp+0x104],ebx 0xb785a203 <+411>: mov ebx,eax 0xb785a205 <+413>: call 0xb7853ef0 <MemoryContextAlloc@plt> Looks pretty familiar, right? Still, I have no idea why adding an include would cause issues accessing CurrentMemoryContext. But at least we're not blaming offsetof or cube anymore...
Jeremy Drake <pgsql@jdrake.com> writes: > ... I'm pretty sure that what it crashed on was > attempting to access the global external variable CurrentMemoryContext. Ah-hah, good insight! > The odd thing is, that the disassembly code between the working and > non-working was the same, except for the offsets. The code seems to be fetching a pointer to CurrentMemoryContext from a PC-relative location; presumably that's a literal that the dynamic linker is supposed to update at shlib load time. I guess that pointer is not correctly computed in the other case, or else it's fetching the wrong pointer value. > Still, I have no idea why adding an include would cause issues accessing > CurrentMemoryContext. Me either, but at least it's something to work from. You might try diffing the working and non-working -E output from cube.c to see if there are any changes that obviously affect CurrentMemoryContext. regards, tom lane
Did you test with a lower optimization level? Gentoo is notorious for buggy/bleeding edge tools. --------------------------------------------------------------------------- Jeremy Drake wrote: > On Fri, 2 Sep 2011, Tom Lane wrote: > > > Yeah, so the next question would be why those other ones aren't showing > > problems. But at least now we have a potential mechanism for getting > > from "the include list changed" to "cube is crashing on an offsetof", > > namely that something is affecting the expansion of the offsetof macro. > > Up to now it's been black magic, and I don't like patching around > > problems we don't understand any better than that. > > > I'm going to try answering about 3 emails at once here, so I apologize for > the confusion. > > Checking preprocessor output on cube_f8_f8 between working and broken, the > output was identical for that function. > > Changing offsetof(NDBOX, x[0]) to offsetof(NDBOX, x), no change in > behavior. > > Then I started investigating the disassembly. The PIC disassembly was > making my head hurt (why is the compiler calling the next instruction, > popping the return address into a register, and using it as a pointer? > Oh yeah, PIC, duh!), but I'm pretty sure that what it crashed on was > attempting to access the global external variable CurrentMemoryContext. > The odd thing is, that the disassembly code between the working and > non-working was the same, except for the offsets. > > Here's the disassembly output from the core dump: > Program terminated with signal 11, Segmentation fault. > #0 cube_f8_f8 () at cube.c:1435 > 1435 size = offsetof(NDBOX, x) +sizeof(double) * 2; > (gdb) set disassembly-flavor intel > (gdb) disass > Dump of assembler code for function cube_f8_f8: > 0xb776ab50 <+0>: push ebp > 0xb776ab51 <+1>: mov ebp,esp > 0xb776ab53 <+3>: and esp,0xfffffff8 > 0xb776ab56 <+6>: push ebx > 0xb776ab57 <+7>: sub esp,0x14 > 0xb776ab5a <+10>: mov ecx,DWORD PTR [ebp+0x8] > 0xb776ab5d <+13>: mov DWORD PTR [esp+0x10],ebx > 0xb776ab61 <+17>: mov edx,DWORD PTR [ecx+0x14] > 0xb776ab64 <+20>: movsd xmm0,QWORD PTR [edx] > 0xb776ab68 <+24>: mov edx,DWORD PTR [ecx+0x18] > 0xb776ab6b <+27>: movsd xmm1,QWORD PTR [edx] > 0xb776ab6f <+31>: movsd QWORD PTR [esp],xmm0 > 0xb776ab74 <+36>: movsd QWORD PTR [esp+0x8],xmm1 > 0xb776ab7a <+42>: push 0x18 > 0xb776ab7c <+44>: call 0xb776ab81 <cube_f8_f8+49> > 0xb776ab81 <+49>: pop eax > 0xb776ab82 <+50>: add eax,0x9472 > 0xb776ab87 <+55>: mov edx,DWORD PTR [eax-0x58] > => 0xb776ab8d <+61>: push DWORD PTR [edx] > 0xb776ab8f <+63>: mov DWORD PTR [esp+0x18],ebx > 0xb776ab93 <+67>: mov ebx,eax > 0xb776ab95 <+69>: call 0xb776745c <MemoryContextAllocZero@plt> > > > > With this knowledge, I thought maybe cube wasn't the actual problem, but > just happened to be early in the list of contrib modules being tested. So > I arbitrarily picked another contrib module to test, intarray. > > ============== running regression test queries ============== > test _int ... FAILED (test process exited with exit > code 2) > > Program terminated with signal 11, Segmentation fault. > #0 0xb785a1fa in g_intbig_union () > from > /buildfarm/test/pgsql_test/contrib/intarray/tmp_check/install/usr/local/pgsql/lib/_int.so > (gdb) set disassembly-flavor intel > (gdb) disass > Dump of assembler code for function g_intbig_union: > <SNIP UNRELATED ASSEMBLY CODE> > 0xb785a087 <+31>: call 0xb785a08c <g_intbig_union+36> > 0xb785a08c <+36>: pop eax > 0xb785a08d <+37>: add eax,0x6f67 > 0xb785a092 <+42>: mov DWORD PTR [esp+0x104],eax > <SNIP MORE UNRELATED ASSEMBLY CODE (this is a much longer function)> > 0xb785a1e5 <+381>: mov eax,DWORD PTR [esp+0x104] > 0xb785a1ec <+388>: mov esi,DWORD PTR [eax-0x24] > 0xb785a1f2 <+394>: mov DWORD PTR [esp+0x108],ebx > 0xb785a1f9 <+401>: push ebx > => 0xb785a1fa <+402>: push DWORD PTR [esi] > 0xb785a1fc <+404>: mov DWORD PTR [esp+0x104],ebx > 0xb785a203 <+411>: mov ebx,eax > 0xb785a205 <+413>: call 0xb7853ef0 <MemoryContextAlloc@plt> > > > Looks pretty familiar, right? > > Still, I have no idea why adding an include would cause issues accessing > CurrentMemoryContext. But at least we're not blaming offsetof or cube > anymore... > -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Tom Lane wrote: > Jeremy Drake <pgsql@jdrake.com> writes: > > ... I'm pretty sure that what it crashed on was > > attempting to access the global external variable CurrentMemoryContext. > > Ah-hah, good insight! > > > The odd thing is, that the disassembly code between the working and > > non-working was the same, except for the offsets. > > The code seems to be fetching a pointer to CurrentMemoryContext from a > PC-relative location; presumably that's a literal that the dynamic > linker is supposed to update at shlib load time. I guess that pointer > is not correctly computed in the other case, or else it's fetching the > wrong pointer value. This would explain why the regular regression test work but the /contrib modules, which do dynamic loading, do not. Good to know the problem is more the contrib/cube. FYI, I noticed in the contrib/cube failure that palloc0() was the next line after the reported crash line. Are the contrib's crashing on the first access of any backend/DLL function? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +