Обсуждение: Back branches vs. gcc 4.8.0
After quite a bit of hair-pulling trying to install Fedora 19 Alpha, I've been able to reproduce the initdb-time failure that's currently being seen on buildfarm member anchovy, and was also complained of recently by an end user: http://www.postgresql.org/message-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com It is exactly what I suspected, namely that gcc 4.8.0 is applying an optimization that breaks our code; the odd thing though is that it's not breaking 9.2 or HEAD, just the older branches. It turns out that what's happening is that with -faggressive-loop-optimizations turned on (as it is by default), gcc decides that loops that iterate over the elements of an int2vector can iterate at most once, because int2vector is declared with a fixed size values[] array: int16 values[1]; /* VARIABLE LENGTH ARRAY */ } int2vector; /* VARIABLE LENGTH STRUCT */ Now, gcc does know better than to make such an assumption unconditionally, but what I discovered is that it *will* assume this if the int2vector is declared as a non-last element of a larger struct, so that (in gcc's little mind anyway) it couldn't possibly really be a variable-length array. In other words, the reason 9.2 and up don't break is commit 8137f2c32322c624e0431fac1621e8e9315202f9, which arranged to hide non-fixed-offset catalog columns from the compiler. Without that, gcc decides that for instance pg_index.indkey cannot have more than one member. That breaks the loop in BuildIndexInfo() that copies the key column numbers into an IndexInfo, leading to the observed failure. Since gcc 4.8 is going to be on a lot of people's machines pretty soon, I think we need to do something to prevent it from breaking 8.4.x and 9.0.x. It looks like our choices are (1) teach configure to enable -fno-aggressive-loop-optimizations if the compiler recognizes it, or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. I'm a bit leaning towards (1), mainly because I'm not excited about fighting a compiler arms race in the back branches. It also strikes me that we ought to take this as a warning sign that we need to work on getting rid of coding like the above in favor of genuine "flexible arrays", before the gcc boys think of some other overly-cute optimization based on the assumption that an array declared with a fixed size really is fixed. regards, tom lane
On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It also strikes me that we ought to take this as a warning sign > that we need to work on getting rid of coding like the above in favor > of genuine "flexible arrays", before the gcc boys think of some other > overly-cute optimization based on the assumption that an array declared > with a fixed size really is fixed. The traditional argument against that has been that that's a C99 feature. However, since it appears that even MSVC supports flexible arrays (which are described as a "Microsoft extension", so may not have identical semantics), it might be possible to do this across the board without contorting the code with preprocessor hacks. That's something that I'd certainly be in favor of pursuing. -- Peter Geoghegan
<div class="moz-cite-prefix">On 06/04/13 11:14, Tom Lane wrote:<br /></div><blockquote cite="mid:14242.1365200084@sss.pgh.pa.us"type="cite"><pre wrap="">After quite a bit of hair-pulling trying to install Fedora19 Alpha, I've been able to reproduce the initdb-time failure that's currently being seen on buildfarm member anchovy, and was also complained of recently by an end user: <a class="moz-txt-link-freetext" href="http://www.postgresql.org/message-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com">http://www.postgresql.org/message-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com</a> It is exactly what I suspected, namely that gcc 4.8.0 is applying an optimization that breaks our code; the odd thing though is that it's not breaking 9.2 or HEAD, just the older branches. It turns out that what's happening is that with -faggressive-loop-optimizations turned on (as it is by default), gcc decides that loops that iterate over the elements of an int2vector can iterate at most once, because int2vector is declared with a fixed size values[] array: int16 values[1]; /* VARIABLE LENGTH ARRAY */ } int2vector; /* VARIABLE LENGTH STRUCT */ Now, gcc does know better than to make such an assumption unconditionally, but what I discovered is that it *will* assume this if the int2vector is declared as a non-last element of a larger struct, so that (in gcc's little mind anyway) it couldn't possibly really be a variable-length array. In other words, the reason 9.2 and up don't break is commit 8137f2c32322c624e0431fac1621e8e9315202f9, which arranged to hide non-fixed-offset catalog columns from the compiler. Without that, gcc decides that for instance pg_index.indkey cannot have more than one member. That breaks the loop in BuildIndexInfo() that copies the key column numbers into an IndexInfo, leading to the observed failure. Since gcc 4.8 is going to be on a lot of people's machines pretty soon, I think we need to do something to prevent it from breaking 8.4.x and 9.0.x. It looks like our choices are (1) teach configure to enable -fno-aggressive-loop-optimizations if the compiler recognizes it, or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. I'm a bit leaning towards (1), mainly because I'm not excited about fighting a compiler arms race in the back branches. It also strikes me that we ought to take this as a warning sign that we need to work on getting rid of coding like the above in favor of genuine "flexible arrays", before the gcc boys think of some other overly-cute optimization based on the assumption that an array declared with a fixed size really is fixed. regards, tom lane </pre></blockquote><font size="-1">I am <font size="-1">probably missing something here!<br /><br /> I would have thoughtit reasonable for a comp<font size="-1">iler assume 'an array declared with a fixed size really is fixed.'!<br /><br/><font size="-1">Seems dange<font size="-1">rous to play tricks like that (though I admit to doing nasty things likethat in COBOL many many years ago!).<br /><br /><br /><font size="-1">Cheers,<br /><font size="-1">Gavin</font><br /></font></font></font><br/></font></font></font>
Peter Geoghegan <pg@heroku.com> writes: > On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It also strikes me that we ought to take this as a warning sign >> that we need to work on getting rid of coding like the above in favor >> of genuine "flexible arrays", before the gcc boys think of some other >> overly-cute optimization based on the assumption that an array declared >> with a fixed size really is fixed. > The traditional argument against that has been that that's a C99 > feature. Well, we already have a solution for that, see FLEXIBLE_ARRAY_MEMBER. But up to now we've just supposed that that was a code beautification thing and there was no particular urgency to convert all applicable places to use that notation. Since there's a potential to break code with such changes (we'd have to fix any uses of sizeof on the struct type), it's been very far down the to-do list. But now it appears that we're taking risks if we *don't* change it. regards, tom lane
On 2013-04-05 23:28:03 +0100, Peter Geoghegan wrote: > On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > It also strikes me that we ought to take this as a warning sign > > that we need to work on getting rid of coding like the above in favor > > of genuine "flexible arrays", before the gcc boys think of some other > > overly-cute optimization based on the assumption that an array declared > > with a fixed size really is fixed. > > The traditional argument against that has been that that's a C99 > feature. However, since it appears that even MSVC supports flexible > arrays (which are described as a "Microsoft extension", so may not > have identical semantics), it might be possible to do this across the > board without contorting the code with preprocessor hacks. That's > something that I'd certainly be in favor of pursuing. The respective macro magic is already in place, its just not used in all places. The problem is more that we can't easily use it in all places because e.g. in the one case mentioned here the array isn't in the last place *in the back branches*. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Apr 5, 2013 at 11:39 PM, Andres Freund <andres@2ndquadrant.com> wrote: > The respective macro magic is already in place, its just not used in all > places. The problem is more that we can't easily use it in all places > because e.g. in the one case mentioned here the array isn't in the last > place *in the back branches*. Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every single place where we currently use the one element array pattern? I count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be pretty ugly to have that everywhere, in my opinion. -- Peter Geoghegan
Andres Freund <andres@2ndquadrant.com> writes: > The respective macro magic is already in place, its just not used in all > places. The problem is more that we can't easily use it in all places > because e.g. in the one case mentioned here the array isn't in the last > place *in the back branches*. I don't think we should try to back-patch such changes; there seems too much risk of breaking third-party code because of the sizeof() issue. But it'd be a good idea to have it in place before we find ourselves having to do -fno-aggressive-loop-optimizations or some such even in up-to-date branches. (I'm actually even more worried about gcc bugs that make this type of assumption than about intentional changes on their part.) regards, tom lane
Peter Geoghegan <pg@heroku.com> writes: > Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every > single place where we currently use the one element array pattern? Yup, exactly. > I count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be > pretty ugly to have that everywhere, in my opinion. Hm, I see 4 places in HEAD. But in any case, is int16 values[1]; /* VARIABLE LENGTH ARRAY */ } int2vector; /* VARIABLE LENGTH STRUCT */ really better than int16 values[FLEXIBLE_ARRAY_MEMBER]; } int2vector; ? I don't think so. Relying on comments to tell about critical semantics of a data structure isn't really nice if you can do it in a way that is standards-blessed and (some) compilers understand. regards, tom lane
On Fri, Apr 5, 2013 at 11:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hm, I see 4 places in HEAD. But in any case, is My mistake. I had REL9_2_STABLE checked out. > int16 values[1]; /* VARIABLE LENGTH ARRAY */ > } int2vector; /* VARIABLE LENGTH STRUCT */ > > really better than > > int16 values[FLEXIBLE_ARRAY_MEMBER]; > } int2vector; > > ? I don't think so. I can see your point. Now that I look at the comments beside FLEXIBLE_ARRAY_MEMBER, I see that indeed, as I suspected, the Microsoft flexible array members are not completely compatible with C99 style flexible arrays, so this may be the least-worst option. -- Peter Geoghegan
On Fri, 2013-04-05 at 18:14 -0400, Tom Lane wrote: > Since gcc 4.8 is going to be on a lot of people's machines pretty > soon, > I think we need to do something to prevent it from breaking 8.4.x and > 9.0.x. It looks like our choices are (1) teach configure to enable > -fno-aggressive-loop-optimizations if the compiler recognizes it, > or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. Using a fixed-size struct member as a flexible one has always been a violation of the C standard, although a widely tolerated one. Doing that in the middle of a struct, however, is totally wrong, and the compiler is perfectly in its right to make a mess of it. Even flexible array members are not allowed in the middle of a struct. So I think this is not a compiler bug or an arms race. We just need to fix the code. So I'm in favor of backporting.
On Fri, 2013-04-05 at 23:44 +0100, Peter Geoghegan wrote: > On Fri, Apr 5, 2013 at 11:39 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > The respective macro magic is already in place, its just not used in all > > places. The problem is more that we can't easily use it in all places > > because e.g. in the one case mentioned here the array isn't in the last > > place *in the back branches*. > > Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every > single place where we currently use the one element array pattern? I > count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be > pretty ugly to have that everywhere, in my opinion. Background: The reason I put in that one use of FLEXIBLE_ARRAY_MEMBER is that at one point clang threw a warning about the old coding. There were no warnings about the other sites that use array[1]. The reason that the whole code wasn't converted right away was (besides a lot of legwork with sizeof and offsetoff) that flexible array members aren't allowed in the middle of structs. Which eventually led to the mentioned commit 8137f2c32322c624e0431fac1621e8e9315202f9. If someone wants to go through and change the rest of the code to use FLEXIBLE_ARRAY_MEMBER, I won't mind. But I think it actually has nothing to do with the current bug or future-proofing anything. All compilers tolerate the current coding.
Peter Eisentraut <peter_e@gmx.net> writes: > The reason that the whole code wasn't converted right away was (besides > a lot of legwork with sizeof and offsetoff) that flexible array members > aren't allowed in the middle of structs. Which eventually led to the > mentioned commit 8137f2c32322c624e0431fac1621e8e9315202f9. > If someone wants to go through and change the rest of the code to use > FLEXIBLE_ARRAY_MEMBER, I won't mind. But I think it actually has > nothing to do with the current bug or future-proofing anything. All > compilers tolerate the current coding. The reason I'm thinking it's a good idea is that it would expose any remaining places where we have nominally var-length arrays embedded in larger structs. Now that I've seen the failures with gcc 4.8.0, I'm quite worried that there might be some more declarations like that which we've not identified yet, but that by chance aren't causing obvious failures today. (This is also why I'm not that excited about trying to fix things "properly" in the back branches compared to selecting -fno-aggressive-loop-optimizations: I'm afraid there might be more to it than just the one commit.) regards, tom lane
On Fri, Apr 5, 2013 at 9:45 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > So I think this is not a compiler bug or an arms race. We just need to > fix the code. So I'm in favor of backporting. I can certainly see this argument. I understand Tom's point about an arms race, but back-porting this doesn't feel terribly risky to me. The thing is, if the arms race is escalating faster than we're comfortable with, we can always opt opt at a later time; it's not as if back-porting this fix now commits us irrevocably. Then, too, I tend to think this is more our fault than gcc's - for a change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4/5/13 6:14 PM, Tom Lane wrote: > Since gcc 4.8 is going to be on a lot of people's machines pretty soon, > I think we need to do something to prevent it from breaking 8.4.x and > 9.0.x. It looks like our choices are (1) teach configure to enable > -fno-aggressive-loop-optimizations if the compiler recognizes it, > or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. > > I'm a bit leaning towards (1), mainly because I'm not excited about > fighting a compiler arms race in the back branches. At the moment, I wouldn't do anything. At least until we have converted master to use flexible array members completely, and we have learned the extent of the issue. The problem manifests itself easily through the regression tests, so there is no guessing about whether a particular combination of versions will work. Someone who uses a cutting edge compiler with a somewhat old PG release is doing something special anyway, so they should have the required skills to put in the workaround. I would rather avoid patching in specific compiler options for specific versions. These things come and go, but releases live a long time. How do we know -fno-aggressive-loop-optimizations is the only option we need in the long run? I'd rather see a direct crash or a known code fix. As an aside, we already have -fno-strict-aliasing and -fwrapv. Add more and it will begin to read like -fmy-code -fis-broken -fhelp-me ;-)
<div class="moz-cite-prefix">On 09/04/13 08:41, Peter Eisentraut wrote:<br /></div><blockquote cite="mid:51632B8B.4020508@gmx.net"type="cite"><pre wrap="">On 4/5/13 6:14 PM, Tom Lane wrote: </pre><blockquote type="cite"><pre wrap="">Since gcc 4.8 is going to be on a lot of people's machines pretty soon, I think we need to do something to prevent it from breaking 8.4.x and 9.0.x. It looks like our choices are (1) teach configure to enable -fno-aggressive-loop-optimizations if the compiler recognizes it, or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. I'm a bit leaning towards (1), mainly because I'm not excited about fighting a compiler arms race in the back branches. </pre></blockquote><pre wrap=""> At the moment, I wouldn't do anything. At least until we have converted master to use flexible array members completely, and we have learned the extent of the issue. The problem manifests itself easily through the regression tests, so there is no guessing about whether a particular combination of versions will work. Someone who uses a cutting edge compiler with a somewhat old PG release is doing something special anyway, so they should have the required skills to put in the workaround. I would rather avoid patching in specific compiler options for specific versions. These things come and go, but releases live a long time. How do we know -fno-aggressive-loop-optimizations is the only option we need in the long run? I'd rather see a direct crash or a known code fix. As an aside, we already have -fno-strict-aliasing and -fwrapv. Add more and it will begin to read like -fmy-code -fis-broken -fhelp-me ;-) </pre></blockquote><font size="-1">-fno<font size="-1">-</font>break-my-code<br /><br /></font>
On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote: > The reason I'm thinking it's a good idea is that it would expose any > remaining places where we have nominally var-length arrays embedded in > larger structs. Now that I've seen the failures with gcc 4.8.0, I'm > quite worried that there might be some more declarations like that > which we've not identified yet, but that by chance aren't causing > obvious failures today. Here is a rough patch that replaces almost all occurrences of something[1] in a struct with FLEXIBLE_ARRAY_MEMBER. It crashes left and right (because of sizeof issues, probably), but at least so far the compiler hasn't complained about any flexible-array members not at the end of the struct, which is what it did last time. So the answer to your concern so far is negative. Completing this patch will be quite a bit more debugging work. Some kind of electric fence for palloc would be helpful.
Вложения
On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Since gcc 4.8 is going to be on a lot of people's machines pretty soon, > I think we need to do something to prevent it from breaking 8.4.x and > 9.0.x. It looks like our choices are (1) teach configure to enable > -fno-aggressive-loop-optimizations if the compiler recognizes it, > or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. > > I'm a bit leaning towards (1), mainly because I'm not excited about I'm confused. I would have described (1) as entering an arms race. Each new optimization related to arrays and structs would need a new flag. Whereas (2) makes the code pretty common traditional code that gcc is going to need to tolerate for the foreseeable future -- greg
On 2013-04-29 23:37:43 -0400, Peter Eisentraut wrote: > On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote: > > The reason I'm thinking it's a good idea is that it would expose any > > remaining places where we have nominally var-length arrays embedded in > > larger structs. Now that I've seen the failures with gcc 4.8.0, I'm > > quite worried that there might be some more declarations like that > > which we've not identified yet, but that by chance aren't causing > > obvious failures today. > > Here is a rough patch that replaces almost all occurrences of > something[1] in a struct with FLEXIBLE_ARRAY_MEMBER. It crashes left > and right (because of sizeof issues, probably), but at least so far the > compiler hasn't complained about any flexible-array members not at the > end of the struct, which is what it did last time. So the answer to > your concern so far is negative. I think this point in the cycle would be a good one to apply something like this. > Completing this patch will be quite a bit more debugging work. Some > kind of electric fence for palloc would be helpful. Noah's recently added valgrind mode should provide this. Do you have an updated version of this patch already? I'd be willing to make a pass over it to check whether I find any missed updates... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services