Обсуждение: Back branches vs. gcc 4.8.0

Поиск
Список
Период
Сортировка

Back branches vs. gcc 4.8.0

От
Tom Lane
Дата:
After quite a bit of hair-pulling trying to install Fedora 19 Alpha,
I've been able to reproduce the initdb-time failure that's currently
being seen on buildfarm member anchovy, and was also complained of
recently by an end user:
http://www.postgresql.org/message-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com

It is exactly what I suspected, namely that gcc 4.8.0 is applying an
optimization that breaks our code; the odd thing though is that it's
not breaking 9.2 or HEAD, just the older branches.

It turns out that what's happening is that with
-faggressive-loop-optimizations turned on (as it is by default),
gcc decides that loops that iterate over the elements of an int2vector
can iterate at most once, because int2vector is declared with a fixed
size values[] array:
   int16        values[1];        /* VARIABLE LENGTH ARRAY */
} int2vector;                      /* VARIABLE LENGTH STRUCT */

Now, gcc does know better than to make such an assumption
unconditionally, but what I discovered is that it *will* assume this if
the int2vector is declared as a non-last element of a larger struct,
so that (in gcc's little mind anyway) it couldn't possibly really be
a variable-length array.

In other words, the reason 9.2 and up don't break is commit
8137f2c32322c624e0431fac1621e8e9315202f9, which arranged to hide
non-fixed-offset catalog columns from the compiler.  Without that,
gcc decides that for instance pg_index.indkey cannot have more than one
member.  That breaks the loop in BuildIndexInfo() that copies the key
column numbers into an IndexInfo, leading to the observed failure.

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x.  It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.

It also strikes me that we ought to take this as a warning sign
that we need to work on getting rid of coding like the above in favor
of genuine "flexible arrays", before the gcc boys think of some other
overly-cute optimization based on the assumption that an array declared
with a fixed size really is fixed.
        regards, tom lane



Re: Back branches vs. gcc 4.8.0

От
Peter Geoghegan
Дата:
On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It also strikes me that we ought to take this as a warning sign
> that we need to work on getting rid of coding like the above in favor
> of genuine "flexible arrays", before the gcc boys think of some other
> overly-cute optimization based on the assumption that an array declared
> with a fixed size really is fixed.

The traditional argument against that has been that that's a C99
feature. However, since it appears that even MSVC supports flexible
arrays (which are described as a "Microsoft extension", so may not
have identical semantics), it might be possible to do this across the
board without contorting the code with preprocessor hacks. That's
something that I'd certainly be in favor of pursuing.


-- 
Peter Geoghegan



Re: Back branches vs. gcc 4.8.0

От
Gavin Flower
Дата:
<div class="moz-cite-prefix">On 06/04/13 11:14, Tom Lane wrote:<br /></div><blockquote
cite="mid:14242.1365200084@sss.pgh.pa.us"type="cite"><pre wrap="">After quite a bit of hair-pulling trying to install
Fedora19 Alpha,
 
I've been able to reproduce the initdb-time failure that's currently
being seen on buildfarm member anchovy, and was also complained of
recently by an end user:
<a class="moz-txt-link-freetext"
href="http://www.postgresql.org/message-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com">http://www.postgresql.org/message-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com</a>

It is exactly what I suspected, namely that gcc 4.8.0 is applying an
optimization that breaks our code; the odd thing though is that it's
not breaking 9.2 or HEAD, just the older branches.

It turns out that what's happening is that with
-faggressive-loop-optimizations turned on (as it is by default),
gcc decides that loops that iterate over the elements of an int2vector
can iterate at most once, because int2vector is declared with a fixed
size values[] array:
   int16        values[1];        /* VARIABLE LENGTH ARRAY */
} int2vector;                      /* VARIABLE LENGTH STRUCT */

Now, gcc does know better than to make such an assumption
unconditionally, but what I discovered is that it *will* assume this if
the int2vector is declared as a non-last element of a larger struct,
so that (in gcc's little mind anyway) it couldn't possibly really be
a variable-length array.

In other words, the reason 9.2 and up don't break is commit
8137f2c32322c624e0431fac1621e8e9315202f9, which arranged to hide
non-fixed-offset catalog columns from the compiler.  Without that,
gcc decides that for instance pg_index.indkey cannot have more than one
member.  That breaks the loop in BuildIndexInfo() that copies the key
column numbers into an IndexInfo, leading to the observed failure.

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x.  It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.

It also strikes me that we ought to take this as a warning sign
that we need to work on getting rid of coding like the above in favor
of genuine "flexible arrays", before the gcc boys think of some other
overly-cute optimization based on the assumption that an array declared
with a fixed size really is fixed.
        regards, tom lane


</pre></blockquote><font size="-1">I am <font size="-1">probably missing something here!<br /><br /> I would have
thoughtit reasonable for a comp<font size="-1">iler assume 'an array declared with a fixed size really is fixed.'!<br
/><br/><font size="-1">Seems dange<font size="-1">rous to play tricks like that (though I admit to doing nasty things
likethat in COBOL many many years ago!).<br /><br /><br /><font size="-1">Cheers,<br /><font size="-1">Gavin</font><br
/></font></font></font><br/></font></font></font> 

Re: Back branches vs. gcc 4.8.0

От
Tom Lane
Дата:
Peter Geoghegan <pg@heroku.com> writes:
> On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It also strikes me that we ought to take this as a warning sign
>> that we need to work on getting rid of coding like the above in favor
>> of genuine "flexible arrays", before the gcc boys think of some other
>> overly-cute optimization based on the assumption that an array declared
>> with a fixed size really is fixed.

> The traditional argument against that has been that that's a C99
> feature.

Well, we already have a solution for that, see FLEXIBLE_ARRAY_MEMBER.
But up to now we've just supposed that that was a code beautification
thing and there was no particular urgency to convert all applicable
places to use that notation.

Since there's a potential to break code with such changes (we'd have to
fix any uses of sizeof on the struct type), it's been very far down the
to-do list.  But now it appears that we're taking risks if we *don't*
change it.
        regards, tom lane



Re: Back branches vs. gcc 4.8.0

От
Andres Freund
Дата:
On 2013-04-05 23:28:03 +0100, Peter Geoghegan wrote:
> On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It also strikes me that we ought to take this as a warning sign
> > that we need to work on getting rid of coding like the above in favor
> > of genuine "flexible arrays", before the gcc boys think of some other
> > overly-cute optimization based on the assumption that an array declared
> > with a fixed size really is fixed.
> 
> The traditional argument against that has been that that's a C99
> feature. However, since it appears that even MSVC supports flexible
> arrays (which are described as a "Microsoft extension", so may not
> have identical semantics), it might be possible to do this across the
> board without contorting the code with preprocessor hacks. That's
> something that I'd certainly be in favor of pursuing.

The respective macro magic is already in place, its just not used in all
places. The problem is more that we can't easily use it in all places
because e.g. in the one case mentioned here the array isn't in the last
place *in the back branches*.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Back branches vs. gcc 4.8.0

От
Peter Geoghegan
Дата:
On Fri, Apr 5, 2013 at 11:39 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> The respective macro magic is already in place, its just not used in all
> places. The problem is more that we can't easily use it in all places
> because e.g. in the one case mentioned here the array isn't in the last
> place *in the back branches*.

Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every
single place where we currently use the one element array pattern? I
count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be
pretty ugly to have that everywhere, in my opinion.


-- 
Peter Geoghegan



Re: Back branches vs. gcc 4.8.0

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> The respective macro magic is already in place, its just not used in all
> places. The problem is more that we can't easily use it in all places
> because e.g. in the one case mentioned here the array isn't in the last
> place *in the back branches*.

I don't think we should try to back-patch such changes; there seems too
much risk of breaking third-party code because of the sizeof() issue.
But it'd be a good idea to have it in place before we find ourselves
having to do -fno-aggressive-loop-optimizations or some such even in
up-to-date branches.

(I'm actually even more worried about gcc bugs that make this type of
assumption than about intentional changes on their part.)
        regards, tom lane



Re: Back branches vs. gcc 4.8.0

От
Tom Lane
Дата:
Peter Geoghegan <pg@heroku.com> writes:
> Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every
> single place where we currently use the one element array pattern?

Yup, exactly.

> I count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be
> pretty ugly to have that everywhere, in my opinion.

Hm, I see 4 places in HEAD.  But in any case, is
   int16        values[1];        /* VARIABLE LENGTH ARRAY */
} int2vector;                      /* VARIABLE LENGTH STRUCT */

really better than
   int16        values[FLEXIBLE_ARRAY_MEMBER];
} int2vector;

?  I don't think so.  Relying on comments to tell about critical
semantics of a data structure isn't really nice if you can do it
in a way that is standards-blessed and (some) compilers understand.
        regards, tom lane



Re: Back branches vs. gcc 4.8.0

От
Peter Geoghegan
Дата:
On Fri, Apr 5, 2013 at 11:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hm, I see 4 places in HEAD.  But in any case, is

My mistake. I had REL9_2_STABLE checked out.

>     int16        values[1];        /* VARIABLE LENGTH ARRAY */
> } int2vector;                      /* VARIABLE LENGTH STRUCT */
>
> really better than
>
>     int16        values[FLEXIBLE_ARRAY_MEMBER];
> } int2vector;
>
> ?  I don't think so.

I can see your point. Now that I look at the comments beside
FLEXIBLE_ARRAY_MEMBER, I see that indeed, as I suspected, the
Microsoft flexible array members are not completely compatible with
C99 style flexible arrays, so this may be the least-worst option.

-- 
Peter Geoghegan



Re: Back branches vs. gcc 4.8.0

От
Peter Eisentraut
Дата:
On Fri, 2013-04-05 at 18:14 -0400, Tom Lane wrote:
> Since gcc 4.8 is going to be on a lot of people's machines pretty
> soon,
> I think we need to do something to prevent it from breaking 8.4.x and
> 9.0.x.  It looks like our choices are (1) teach configure to enable
> -fno-aggressive-loop-optimizations if the compiler recognizes it,
> or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. 

Using a fixed-size struct member as a flexible one has always been a
violation of the C standard, although a widely tolerated one.  Doing
that in the middle of a struct, however, is totally wrong, and the
compiler is perfectly in its right to make a mess of it.  Even flexible
array members are not allowed in the middle of a struct.

So I think this is not a compiler bug or an arms race.  We just need to
fix the code.  So I'm in favor of backporting.





Re: Back branches vs. gcc 4.8.0

От
Peter Eisentraut
Дата:
On Fri, 2013-04-05 at 23:44 +0100, Peter Geoghegan wrote:
> On Fri, Apr 5, 2013 at 11:39 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > The respective macro magic is already in place, its just not used in all
> > places. The problem is more that we can't easily use it in all places
> > because e.g. in the one case mentioned here the array isn't in the last
> > place *in the back branches*.
> 
> Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every
> single place where we currently use the one element array pattern? I
> count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be
> pretty ugly to have that everywhere, in my opinion.

Background: The reason I put in that one use of FLEXIBLE_ARRAY_MEMBER is
that at one point clang threw a warning about the old coding.  There
were no warnings about the other sites that use array[1].

The reason that the whole code wasn't converted right away was (besides
a lot of legwork with sizeof and offsetoff) that flexible array members
aren't allowed in the middle of structs.  Which eventually led to the
mentioned commit 8137f2c32322c624e0431fac1621e8e9315202f9.

If someone wants to go through and change the rest of the code to use
FLEXIBLE_ARRAY_MEMBER, I won't mind.  But I think it actually has
nothing to do with the current bug or future-proofing anything.  All
compilers tolerate the current coding.





Re: Back branches vs. gcc 4.8.0

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> The reason that the whole code wasn't converted right away was (besides
> a lot of legwork with sizeof and offsetoff) that flexible array members
> aren't allowed in the middle of structs.  Which eventually led to the
> mentioned commit 8137f2c32322c624e0431fac1621e8e9315202f9.

> If someone wants to go through and change the rest of the code to use
> FLEXIBLE_ARRAY_MEMBER, I won't mind.  But I think it actually has
> nothing to do with the current bug or future-proofing anything.  All
> compilers tolerate the current coding.

The reason I'm thinking it's a good idea is that it would expose any
remaining places where we have nominally var-length arrays embedded in
larger structs.  Now that I've seen the failures with gcc 4.8.0, I'm
quite worried that there might be some more declarations like that
which we've not identified yet, but that by chance aren't causing
obvious failures today.  (This is also why I'm not that excited about
trying to fix things "properly" in the back branches compared to
selecting -fno-aggressive-loop-optimizations: I'm afraid there might
be more to it than just the one commit.)
        regards, tom lane



Re: Back branches vs. gcc 4.8.0

От
Robert Haas
Дата:
On Fri, Apr 5, 2013 at 9:45 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> So I think this is not a compiler bug or an arms race.  We just need to
> fix the code.  So I'm in favor of backporting.

I can certainly see this argument.  I understand Tom's point about an
arms race, but back-porting this doesn't feel terribly risky to me.
The thing is, if the arms race is escalating faster than we're
comfortable with, we can always opt opt at a later time; it's not as
if back-porting this fix now commits us irrevocably.

Then, too, I tend to think this is more our fault than gcc's - for a change.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Back branches vs. gcc 4.8.0

От
Peter Eisentraut
Дата:
On 4/5/13 6:14 PM, Tom Lane wrote:
> Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
> I think we need to do something to prevent it from breaking 8.4.x and
> 9.0.x.  It looks like our choices are (1) teach configure to enable
> -fno-aggressive-loop-optimizations if the compiler recognizes it,
> or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.
> 
> I'm a bit leaning towards (1), mainly because I'm not excited about
> fighting a compiler arms race in the back branches.

At the moment, I wouldn't do anything.  At least until we have converted
master to use flexible array members completely, and we have learned the
extent of the issue.

The problem manifests itself easily through the regression tests, so
there is no guessing about whether a particular combination of versions
will work.  Someone who uses a cutting edge compiler with a somewhat old
PG release is doing something special anyway, so they should have the
required skills to put in the workaround.

I would rather avoid patching in specific compiler options for specific
versions.  These things come and go, but releases live a long time.  How
do we know -fno-aggressive-loop-optimizations is the only option we need
in the long run?  I'd rather see a direct crash or a known code fix.

As an aside, we already have -fno-strict-aliasing and -fwrapv.  Add more
and it will begin to read like

-fmy-code -fis-broken -fhelp-me

;-)




Re: Back branches vs. gcc 4.8.0

От
Gavin Flower
Дата:
<div class="moz-cite-prefix">On 09/04/13 08:41, Peter Eisentraut wrote:<br /></div><blockquote
cite="mid:51632B8B.4020508@gmx.net"type="cite"><pre wrap="">On 4/5/13 6:14 PM, Tom Lane wrote:
 
</pre><blockquote type="cite"><pre wrap="">Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x.  It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.
</pre></blockquote><pre wrap="">
At the moment, I wouldn't do anything.  At least until we have converted
master to use flexible array members completely, and we have learned the
extent of the issue.

The problem manifests itself easily through the regression tests, so
there is no guessing about whether a particular combination of versions
will work.  Someone who uses a cutting edge compiler with a somewhat old
PG release is doing something special anyway, so they should have the
required skills to put in the workaround.

I would rather avoid patching in specific compiler options for specific
versions.  These things come and go, but releases live a long time.  How
do we know -fno-aggressive-loop-optimizations is the only option we need
in the long run?  I'd rather see a direct crash or a known code fix.

As an aside, we already have -fno-strict-aliasing and -fwrapv.  Add more
and it will begin to read like

-fmy-code -fis-broken -fhelp-me

;-)



</pre></blockquote><font size="-1">-fno<font size="-1">-</font>break-my-code<br /><br /></font>

Re: Back branches vs. gcc 4.8.0

От
Peter Eisentraut
Дата:
On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote:
> The reason I'm thinking it's a good idea is that it would expose any
> remaining places where we have nominally var-length arrays embedded in
> larger structs.  Now that I've seen the failures with gcc 4.8.0, I'm
> quite worried that there might be some more declarations like that
> which we've not identified yet, but that by chance aren't causing
> obvious failures today.

Here is a rough patch that replaces almost all occurrences of
something[1] in a struct with FLEXIBLE_ARRAY_MEMBER.  It crashes left
and right (because of sizeof issues, probably), but at least so far the
compiler hasn't complained about any flexible-array members not at the
end of the struct, which is what it did last time.  So the answer to
your concern so far is negative.

Completing this patch will be quite a bit more debugging work.  Some
kind of electric fence for palloc would be helpful.


Вложения

Re: Back branches vs. gcc 4.8.0

От
Greg Stark
Дата:
On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
> I think we need to do something to prevent it from breaking 8.4.x and
> 9.0.x.  It looks like our choices are (1) teach configure to enable
> -fno-aggressive-loop-optimizations if the compiler recognizes it,
> or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.
>
> I'm a bit leaning towards (1), mainly because I'm not excited about

I'm confused. I would have described (1) as entering an arms race.
Each new optimization related to arrays and structs would need a new
flag. Whereas (2) makes the code pretty common traditional code that
gcc is going to need to tolerate for the foreseeable future

-- 
greg



Re: Back branches vs. gcc 4.8.0

От
Andres Freund
Дата:
On 2013-04-29 23:37:43 -0400, Peter Eisentraut wrote:
> On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote:
> > The reason I'm thinking it's a good idea is that it would expose any
> > remaining places where we have nominally var-length arrays embedded in
> > larger structs.  Now that I've seen the failures with gcc 4.8.0, I'm
> > quite worried that there might be some more declarations like that
> > which we've not identified yet, but that by chance aren't causing
> > obvious failures today. 
> 
> Here is a rough patch that replaces almost all occurrences of
> something[1] in a struct with FLEXIBLE_ARRAY_MEMBER.  It crashes left
> and right (because of sizeof issues, probably), but at least so far the
> compiler hasn't complained about any flexible-array members not at the
> end of the struct, which is what it did last time.  So the answer to
> your concern so far is negative.

I think this point in the cycle would be a good one to apply something
like this.

> Completing this patch will be quite a bit more debugging work.  Some
> kind of electric fence for palloc would be helpful.

Noah's recently added valgrind mode should provide this.

Do you have an updated version of this patch already? I'd be willing to
make a pass over it to check whether I find any missed updates...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services