Обсуждение: Keep elog(ERROR) and ereport(ERROR) calls in the cold path
Back in [1] I experimented with a patch to coax compilers to build all elog/ereport calls that were >= ERROR into a cold path away from the function rasing the error. At the time, I really just wanted to test how much of a speedup we could get by doing this and ended up just writing up a patch that basically changed all elog(ERROR) calls from: if (<error situation check>) { <do stuff>; elog(ERROR, "..."); } to add an unlikely() and become; if (unlikely(<error situation check>) { <do stuff>; elog(ERROR, "..."); } Per the report in [1] I did see some pretty good gains in performance from doing this. The problem was, that at the time I couldn't figure out a way to do this without an invasive patch that changed the code in the thousands of elog/ereport calls. In the attached, I've come up with a way that works. Basically I've just added a function named errstart_cold() that is attributed with __attribute__((cold)), which will hint to the compiler to keep branches which call this function in a cold path. To make the compiler properly mark just >= ERROR calls as cold, and only when the elevel is a constant, I modified the ereport_domain macro to do: if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ errstart_cold(elevel, domain) : \ errstart(elevel, domain)) \ I see no reason why the compiler shouldn't always fold that constant expression at compile-time and correctly select the correct version of the function for the job. (I also tried using __builtin_choose_expr() but GCC complained when the elevel was not constant, even with the __builtin_constant_p() test in the condition) I sampled a few .s files to inspect what code had changed. Looking at mcxt.s, fmgr.s and xlog.s, the first two of these because I found in [1] that elogs were being done from those files quite often and xlog.s because it's pretty big. As far as I could see, GCC correctly moved all the error reporting stuff where the elevel was a constant and >= ERROR into the cold path and left the lower-level or non-consts elevel calls alone. For clang, I didn't see any changes in the .s files. I suspect that they might have a few smarts in there and see the __builtin_unreachable() call and assume the path is cold already based on that. That was with clang 10. Perhaps older versions are not as smart. Benchmarking: For benchmarking, I've not done a huge amount to test the impacts of this change. However, I can say that I am seeing some fairly good improvements. There seems to be some small improvements to execution speed using TPCH-Q1 and also some good results from a pgbench -S test. For TPCH-Q1: Master: $ pgbench -n -f pg-tpch/queries/q01.sql -T 120 tpch latency average = 5272.630 ms latency average = 5258.610 ms latency average = 5250.871 ms Master + elog_ereport_attribute_cold.patch $ pgbench -n -f pg-tpch/queries/q01.sql -T 120 tpch latency average = 5182.761 ms latency average = 5194.851 ms latency average = 5183.128 ms Which is about a 1.42% increase in performance. That's not exactly groundbreaking, but pretty useful to have if that happens to apply across the board for execution performance. For pgbench -S: My results were a bit noisier than the TPCH test, but the results I obtained did show about a 10% increase in performance: Master: drowley@amd3990x:~$ pgbench -S -T 120 postgres tps = 25245.903255 (excluding connections establishing) tps = 26144.454208 (excluding connections establishing) tps = 25931.850518 (excluding connections establishing) Master + elog_ereport_attribute_cold.patch drowley@amd3990x:~$ pgbench -S -T 120 postgres tps = 28351.480631 (excluding connections establishing) tps = 27763.656557 (excluding connections establishing) tps = 28896.427929 (excluding connections establishing) It would be useful if someone with some server-grade Intel hardware could run a few tests on this. The above results are all from AMD hardware. I've attached the patch for this. I'll add it to the July 'fest. David [1] https://www.postgresql.org/message-id/CAKJS1f8yqRW3qx2CO9r4bqqvA2Vx68=3awbh8CJWTP9zXeoHMw@mail.gmail.com
Вложения
On Wed, Jun 24, 2020 at 9:51 PM David Rowley <dgrowleyml@gmail.com> wrote: > $ pgbench -n -f pg-tpch/queries/q01.sql -T 120 tpch > > Which is about a 1.42% increase in performance. That's not exactly > groundbreaking, but pretty useful to have if that happens to apply > across the board for execution performance. > > For pgbench -S: > > My results were a bit noisier than the TPCH test, but the results I > obtained did show about a 10% increase in performance: This is pretty cool, particularly because it affects single-client performance. It seems like a lot of ideas people have had about speeding up pgbench performance - including me - have improved performance under concurrency at the cost of very slightly degrading single-client performance. It would be nice to claw some of that back. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Thanks for picking this up again! On 2020-06-25 13:50:37 +1200, David Rowley wrote: > In the attached, I've come up with a way that works. Basically I've > just added a function named errstart_cold() that is attributed with > __attribute__((cold)), which will hint to the compiler to keep > branches which call this function in a cold path. I recall you trying this before? Has that gotten easier because we evolved ereport()/elog(), or has gcc become smarter, or ...? > To make the compiler properly mark just >= ERROR calls as cold, and > only when the elevel is a constant, I modified the ereport_domain > macro to do: > > if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ > errstart_cold(elevel, domain) : \ > errstart(elevel, domain)) \ I think it'd be good to not just do this for ERROR, but also for <= DEBUG1. I recall seing quite a few debug elogs that made the code worse just by "being there". I suspect that doing this for DEBUG* could also improve the code for clang, because we obviously don't have __builtin_unreachable after those. > I see no reason why the compiler shouldn't always fold that constant > expression at compile-time and correctly select the correct version of > the function for the job. (I also tried using __builtin_choose_expr() > but GCC complained when the elevel was not constant, even with the > __builtin_constant_p() test in the condition) I think it has to be constant in all paths for that... > Master: > drowley@amd3990x:~$ pgbench -S -T 120 postgres > tps = 25245.903255 (excluding connections establishing) > tps = 26144.454208 (excluding connections establishing) > tps = 25931.850518 (excluding connections establishing) > > Master + elog_ereport_attribute_cold.patch > drowley@amd3990x:~$ pgbench -S -T 120 postgres > tps = 28351.480631 (excluding connections establishing) > tps = 27763.656557 (excluding connections establishing) > tps = 28896.427929 (excluding connections establishing) That is pretty damn cool. > diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c > index e976201030..8076e8af24 100644 > --- a/src/backend/utils/error/elog.c > +++ b/src/backend/utils/error/elog.c > @@ -219,6 +219,19 @@ err_gettext(const char *str) > #endif > } > > +#if defined(HAVE_PG_ATTRIBUTE_HOT_AND_COLD) && defined(HAVE__BUILTIN_CONSTANT_P) > +/* > + * errstart_cold > + * A simple wrapper around errstart, but hinted to be cold so that the > + * compiler is more likely to put error code in a cold area away from the > + * main function body. > + */ > +bool > +pg_attribute_cold errstart_cold(int elevel, const char *domain) > +{ > + return errstart(elevel, domain); > +} > +#endif Hm. Would it make sense to have this be a static inline? > /* > * errstart --- begin an error-reporting cycle > diff --git a/src/include/c.h b/src/include/c.h > index d72b23afe4..087b8af6cb 100644 > --- a/src/include/c.h > +++ b/src/include/c.h > @@ -178,6 +178,21 @@ > #define pg_noinline > #endif > > +/* > + * Marking certain functions as "hot" or "cold" can be useful to assist the > + * compiler in arranging the assembly code in a more efficient way. > + * These are supported from GCC >= 4.3 and clang >= 3.2 > + */ > +#if (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))) || \ > + (defined(__clang__) && (__clang_major__ > 3 || (__clang_major__ == 3 && __clang_minor__ >= 2))) > +#define HAVE_PG_ATTRIBUTE_HOT_AND_COLD 1 > +#define pg_attribute_hot __attribute__((hot)) > +#define pg_attribute_cold __attribute__((cold)) > +#else > +#define pg_attribute_hot > +#define pg_attribute_cold > +#endif Wonder if we should start using __has_attribute() for things like this. https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fattribute.html I.e. we could do something like #ifndef __has_attribute #define __has_attribute(attribute) 0 #endif #if __has_attribute(hot) #define pg_attribute_hot __attribute__((hot)) #else #define pg_attribute_hot #endif clang added __has_attribute in 2.9 (2010), gcc added it in 5 (2014), so I don't think we'd loose too much. > #ifdef HAVE__BUILTIN_CONSTANT_P > +#ifdef HAVE_PG_ATTRIBUTE_HOT_AND_COLD > +#define ereport_domain(elevel, domain, ...) \ > + do { \ > + pg_prevent_errno_in_scope(); \ > + if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ > + errstart_cold(elevel, domain) : \ > + errstart(elevel, domain)) \ > + __VA_ARGS__, errfinish(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \ > + if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \ > + pg_unreachable(); \ > + } while(0) > +#else /* !HAVE_PG_ATTRIBUTE_HOT_AND_COLD */ > #define ereport_domain(elevel, domain, ...) \ > do { \ > pg_prevent_errno_in_scope(); \ > @@ -129,6 +141,7 @@ > if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \ > pg_unreachable(); \ > } while(0) > +#endif /* HAVE_PG_ATTRIBUTE_HOT_AND_COLD */ > #else /* !HAVE__BUILTIN_CONSTANT_P */ > #define ereport_domain(elevel, domain, ...) \ > do { \ > @@ -146,6 +159,9 @@ Could we do this without another copy? Feels like we should be able to just do that in the existing #ifdef HAVE__BUILTIN_CONSTANT_P if we just add errstart_cold() independent HAVE_PG_ATTRIBUTE_HOT_AND_COLD. Greetings, Andres Freund
On Fri, 26 Jun 2020 at 04:35, Andres Freund <andres@anarazel.de> wrote: > On 2020-06-25 13:50:37 +1200, David Rowley wrote: > > In the attached, I've come up with a way that works. Basically I've > > just added a function named errstart_cold() that is attributed with > > __attribute__((cold)), which will hint to the compiler to keep > > branches which call this function in a cold path. > > I recall you trying this before? Has that gotten easier because we > evolved ereport()/elog(), or has gcc become smarter, or ...? Yeah, I appear to have tried it and found it not to work in [1]. I can only assume GCC got smarter in regards to how it marks a path as cold. Previously it seemed not do due to the do/while(0). I'm pretty sure back when I tested last that ditching the do while made it work, just we can't really get rid of it. > > To make the compiler properly mark just >= ERROR calls as cold, and > > only when the elevel is a constant, I modified the ereport_domain > > macro to do: > > > > if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ > > errstart_cold(elevel, domain) : \ > > errstart(elevel, domain)) \ > > I think it'd be good to not just do this for ERROR, but also for <= > DEBUG1. I recall seing quite a few debug elogs that made the code worse > just by "being there". I think that case is different. We don't want to move the entire elog path into the cold path for that. We'd only want to hint that errstart is unlikely to return true if elevel <= DEBUG1 > > diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c > > index e976201030..8076e8af24 100644 > > --- a/src/backend/utils/error/elog.c > > +++ b/src/backend/utils/error/elog.c > > @@ -219,6 +219,19 @@ err_gettext(const char *str) > > #endif > > } > > > > +#if defined(HAVE_PG_ATTRIBUTE_HOT_AND_COLD) && defined(HAVE__BUILTIN_CONSTANT_P) > > +/* > > + * errstart_cold > > + * A simple wrapper around errstart, but hinted to be cold so that the > > + * compiler is more likely to put error code in a cold area away from the > > + * main function body. > > + */ > > +bool > > +pg_attribute_cold errstart_cold(int elevel, const char *domain) > > +{ > > + return errstart(elevel, domain); > > +} > > +#endif > > Hm. Would it make sense to have this be a static inline? I thought about that but didn't try it to ensure it still worked ok. I didn't think it was that important to make sure we don't get the extra function hop for ERRORs. It seemed like a case we'd not want to really optimise for. > > /* > > * errstart --- begin an error-reporting cycle > > diff --git a/src/include/c.h b/src/include/c.h > > index d72b23afe4..087b8af6cb 100644 > > --- a/src/include/c.h > > +++ b/src/include/c.h > > @@ -178,6 +178,21 @@ > > #define pg_noinline > > #endif > > > > +/* > > + * Marking certain functions as "hot" or "cold" can be useful to assist the > > + * compiler in arranging the assembly code in a more efficient way. > > + * These are supported from GCC >= 4.3 and clang >= 3.2 > > + */ > > +#if (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))) || \ > > + (defined(__clang__) && (__clang_major__ > 3 || (__clang_major__ == 3 && __clang_minor__ >= 2))) > > +#define HAVE_PG_ATTRIBUTE_HOT_AND_COLD 1 > > +#define pg_attribute_hot __attribute__((hot)) > > +#define pg_attribute_cold __attribute__((cold)) > > +#else > > +#define pg_attribute_hot > > +#define pg_attribute_cold > > +#endif > > Wonder if we should start using __has_attribute() for things like this. > > https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fattribute.html > > I.e. we could do something like > #ifndef __has_attribute > #define __has_attribute(attribute) 0 > #endif > > #if __has_attribute(hot) > #define pg_attribute_hot __attribute__((hot)) > #else > #define pg_attribute_hot > #endif > > clang added __has_attribute in 2.9 (2010), gcc added it in 5 (2014), so > I don't think we'd loose too much. Thanks for pointing that out. Seems like a good idea to me. I don't think we'll upset too many people running GCC 4.4 to 5.0. I can't imagine many people serious about performance will be using a PostgreSQL version that'll be released in 2021 with a pre 2014 compiler. > > #ifdef HAVE__BUILTIN_CONSTANT_P > > +#ifdef HAVE_PG_ATTRIBUTE_HOT_AND_COLD > > +#define ereport_domain(elevel, domain, ...) \ > > + do { \ > > + pg_prevent_errno_in_scope(); \ > > + if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ > > + errstart_cold(elevel, domain) : \ > > + errstart(elevel, domain)) \ > > + __VA_ARGS__, errfinish(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \ > > + if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \ > > + pg_unreachable(); \ > > + } while(0) > > +#else /* !HAVE_PG_ATTRIBUTE_HOT_AND_COLD */ > > #define ereport_domain(elevel, domain, ...) \ > > do { \ > > pg_prevent_errno_in_scope(); \ > > @@ -129,6 +141,7 @@ > > if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \ > > pg_unreachable(); \ > > } while(0) > > +#endif /* HAVE_PG_ATTRIBUTE_HOT_AND_COLD */ > > #else /* !HAVE__BUILTIN_CONSTANT_P */ > > #define ereport_domain(elevel, domain, ...) \ > > do { \ > > @@ -146,6 +159,9 @@ > > Could we do this without another copy? Feels like we should be able to > just do that in the existing #ifdef HAVE__BUILTIN_CONSTANT_P if we just > add errstart_cold() independent HAVE_PG_ATTRIBUTE_HOT_AND_COLD. Yeah. I just did it that way so we didn't get the extra function hop in compilers that don't support __attribute((cold)). If I can inline errstart_cold() and have the compiler still properly determine that it's a cold function, then it seems wise to do it that way. If not, then I'll need to keep a separate macro. David [1] https://www.postgresql.org/message-id/20171030094449.ffqhvt5n623zvyja%40alap3.anarazel.de
On Fri, 26 Jun 2020 at 13:21, David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 26 Jun 2020 at 04:35, Andres Freund <andres@anarazel.de> wrote: > > On 2020-06-25 13:50:37 +1200, David Rowley wrote: > > > In the attached, I've come up with a way that works. Basically I've > > > just added a function named errstart_cold() that is attributed with > > > __attribute__((cold)), which will hint to the compiler to keep > > > branches which call this function in a cold path. > > > > I recall you trying this before? Has that gotten easier because we > > evolved ereport()/elog(), or has gcc become smarter, or ...? > > Yeah, I appear to have tried it and found it not to work in [1]. I can > only assume GCC got smarter in regards to how it marks a path as cold. > Previously it seemed not do due to the do/while(0). I'm pretty sure > back when I tested last that ditching the do while made it work, just > we can't really get rid of it. > > > > To make the compiler properly mark just >= ERROR calls as cold, and > > > only when the elevel is a constant, I modified the ereport_domain > > > macro to do: > > > > > > if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ > > > errstart_cold(elevel, domain) : \ > > > errstart(elevel, domain)) \ > > > > I think it'd be good to not just do this for ERROR, but also for <= > > DEBUG1. I recall seing quite a few debug elogs that made the code worse > > just by "being there". > > I think that case is different. We don't want to move the entire elog > path into the cold path for that. We'd only want to hint that errstart > is unlikely to return true if elevel <= DEBUG1 I played around with this trying to find if there was a way to make this work. v2 patch includes the change you mentioned about using __has_attribute (cold) and removes the additional ereport_domain macro v3 is v2 plus an additional change to mark the branch within ereport_domain as unlikely when elevel <= DEBUG1 v4 is v2 plus it marks the errstart call as unlikely regardless of elevel. I tried v4 as I was having trouble as v3 was showing worse performance than v2. v4 appears better on the AMD system, but that system is producing noisy results (very obvious if looking at attached amd3990x_elog_cold.png) I ran pgbench -S T 600 -P 10 with each patch and for the AMD machine I got: master = 27817.32167 tps v2 = 28991.65667 tps (104.22% of master) v3 = 28622.775 tps (102.90% of master) v4 = 29648.91 tps (106.58% of master) (I attribute the speedup here not being the same as my last report due to noise. A recent bios update partially fixed the problem, but not completely) For the intel laptop I got: master = 25452.38167 tps v2 = 25473.695 tps (100.08% of master) v3 = 25434.89333 tps (99.93% of master) v4 = 25389.02833 tps (99.75% of master) Looking at the assembly for the v3 patch, it does appear that the elevel <= DEBUG1 calls were correctly moved to the cold area and that the ones > DEBUG1 and < ERROR were left alone. However, I did only look at xlog.s. The intel results don't look very promising, but perhaps this is not the ideal test to show improvements with instruction cache efficiency. > > > +bool > > > +pg_attribute_cold errstart_cold(int elevel, const char *domain) > > > +{ > > > + return errstart(elevel, domain); > > > +} > > > +#endif > > > > Hm. Would it make sense to have this be a static inline? I didn't find a way to make this work (using gcc-10). Inlining, of course, makes the assembly just call errstart(). errstart_cold() is nowhere to be seen. The __attribute(cold) does not seem to be applied to the errstart() call where the errstart_cold() call was inlined. I've attached a graph showing the results for the AMD and Intel runs and also csv files of the pgbench tps output. I've also attached each version of the patch I tested. It would be good to see some testing done on other hardware. David
Вложения
On Mon, Jun 29, 2020 at 09:36:56PM +1200, David Rowley wrote: > I've attached a graph showing the results for the AMD and Intel runs > and also csv files of the pgbench tps output. I've also attached each > version of the patch I tested. > > It would be good to see some testing done on other hardware. Worth noting that the patch set fails to apply. I have moved this entry to the next CF, waiting on author. -- Michael
Вложения
On Mon, 3 Aug 2020 at 19:54, Michael Paquier <michael@paquier.xyz> wrote: > Worth noting that the patch set fails to apply. I have moved this > entry to the next CF, waiting on author. Thanks. NB: It's not a patch set. It's 3 different versions of the patch. They're not all meant to apply at once. Probably the CF bot wasn't aware of that though :-( David
On Mon, 29 Jun 2020 at 21:36, David Rowley <dgrowleyml@gmail.com> wrote: > (I attribute the speedup here not being the same as my last report due > to noise. A recent bios update partially fixed the problem, but not > completely) I managed to fix the unstable performance on this AMD machine by tweaking some bios power management settings. I did some further testing with the v4 patch using both each of: 1. pgbench -S 2. pgbench -S -M prepared 3. pgbench -S -M prepared -c 64 -j 64. 4. TPC-H @ 5GB scale (per recommendation from Andres offlist) The results of 1-3 above don't really show much of a win, which really does contradict what I saw about 5 years ago when testing unlikely() around elog calls in [1]. The experiment I did back then did pre-date the use of unlikely() in the source code, so I thought perhaps that since we now have a sprinkling of unlikely() in various of the hottest code paths that the use of those already gained most of what we were going to gain from today's patch. To see if this was the case, I decided to hack up a test patch which removes all those unlikely() calls that exist in an if test above an elog/ereport ERROR and I confirm that I *do* see a small regression in performance from doing that. This patch only serves to confirm if the existing unlikely() macros are already giving us most of what we'd get from today's v4 patch, and the results do seem to confirm that. The 5GB scaled TPC-H test does show some performance gains from the v4 patch and shows an obvious regression from removing the unlikely() calls too. Based, mostly on the TPC-H results where performance did improve close to 2%, I'm starting to think it would be a good idea just to go for the v4 patch. It means that future hot elog/ereport calls should make it into the cold path. Currently, I'm just unsure how other CPUs will benefit from this. The 3990x I've been testing with is pretty new and has some pretty large caches. I suspect older CPUs may see larger gains. David [1] https://www.postgresql.org/message-id/CAKJS1f8yqRW3qx2CO9r4bqqvA2Vx68%3D3awbh8CJWTP9zXeoHMw%40mail.gmail.com
Вложения
On 2020-08-05 05:00, David Rowley wrote: > The 5GB scaled TPC-H test does show some performance gains from the v4 > patch and shows an obvious regression from removing the unlikely() > calls too. > > Based, mostly on the TPC-H results where performance did improve close > to 2%, I'm starting to think it would be a good idea just to go for > the v4 patch. It means that future hot elog/ereport calls should make > it into the cold path. Something based on the v4 patch makes sense. I would add DEBUG1 back into the conditional, like if (__builtin_constant_p(elevel) && ((elevel) >= ERROR || (elevel) <= DEBUG1) ? \ Also, for the __has_attribute handling, I'd prefer the style that Andres illustrated earlier, using: #ifndef __has_attribute #define __has_attribute(attribute) 0 #endif -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 5 Sep 2020 at 08:36, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Something based on the v4 patch makes sense. Thanks for having a look at this. > I would add DEBUG1 back into the conditional, like > > if (__builtin_constant_p(elevel) && ((elevel) >= ERROR || (elevel) <= > DEBUG1) ? \ hmm, but surely we don't want to move all code that's in the same branch as an elog(DEBUG1) call into a cold area. With elog(ERROR) we generally have the form: if (<some condition we hope never to see>) elog(ERROR, "something bad happened"); In this case, we'd quite like for the compiler to put code relating to the elog in some cold area of the binary. With DEBUG we often have the form: <do normal stuff> elog(DEBUG1, "some interesting information"); <do normal stuff> I don't think we'd want to move the <do normal stuff> into a cold area. The v3 patch just put an unlikely() around the errstart() call if the level was <= DEBUG1. That just to move the code that's inside the if (errstart(...)) in the macro into a cold area. > Also, for the __has_attribute handling, I'd prefer the style that Andres > illustrated earlier, using: > > #ifndef __has_attribute > #define __has_attribute(attribute) 0 > #endif Yip, for sure. That way is much nicer. David
On 2020-09-06 02:24, David Rowley wrote: >> I would add DEBUG1 back into the conditional, like >> >> if (__builtin_constant_p(elevel) && ((elevel) >= ERROR || (elevel) <= >> DEBUG1) ? \ > > hmm, but surely we don't want to move all code that's in the same > branch as an elog(DEBUG1) call into a cold area. Yeah, nevermind that. > The v3 patch just put an unlikely() around the errstart() call if the > level was <= DEBUG1. That just to move the code that's inside the if > (errstart(...)) in the macro into a cold area. That could be useful. Depends on how much effect it has. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 11 Sep 2020 at 02:01, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-09-06 02:24, David Rowley wrote: > >> I would add DEBUG1 back into the conditional, like > >> > >> if (__builtin_constant_p(elevel) && ((elevel) >= ERROR || (elevel) <= > >> DEBUG1) ? \ > > > > hmm, but surely we don't want to move all code that's in the same > > branch as an elog(DEBUG1) call into a cold area. > > Yeah, nevermind that. I've reattached the v4 patch since it just does the >= ERROR case. > > The v3 patch just put an unlikely() around the errstart() call if the > > level was <= DEBUG1. That just to move the code that's inside the if > > (errstart(...)) in the macro into a cold area. > > That could be useful. Depends on how much effect it has. I wonder if it is. I'm having trouble even seeing gains from the ERROR case and I'm considering dropping this patch due to that. I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc 9.3. I'm unable to see any gains with this, however, the results were pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely need to run that a bit longer. I'll do that tonight. It would be good if someone else could run some tests on their own hardware to see if they can see any gains. David
Вложения
On Tue, 22 Sep 2020 at 19:08, David Rowley <dgrowleyml@gmail.com> wrote: > I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc > 9.3. I'm unable to see any gains with this, however, the results were > pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely > need to run that a bit longer. I'll do that tonight. I've attached the results of a TPCH scale=5 run master (f859c2ffa) vs master + elog_ereport_attribute_cold_v4.patch It does not look great. The patched version seems to have done about 1.17% less work than master did. David
Вложения
On 2020-09-22 22:42, David Rowley wrote: > On Tue, 22 Sep 2020 at 19:08, David Rowley <dgrowleyml@gmail.com> wrote: >> I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc >> 9.3. I'm unable to see any gains with this, however, the results were >> pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely >> need to run that a bit longer. I'll do that tonight. > > I've attached the results of a TPCH scale=5 run master (f859c2ffa) vs > master + elog_ereport_attribute_cold_v4.patch > > It does not look great. The patched version seems to have done about > 1.17% less work than master did. I wonder how much benefit you'd get from a) compiling with -O3 instead of -O2, or b) compiling with profile-driven optimization I think that would indicate a target and/or a ceiling of what we should be expecting from hot/cold/likely/unlikely optimization techniques like this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 23 Sep 2020 at 08:42, David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 22 Sep 2020 at 19:08, David Rowley <dgrowleyml@gmail.com> wrote: > > I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc > > 9.3. I'm unable to see any gains with this, however, the results were > > pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely > > need to run that a bit longer. I'll do that tonight. > > I've attached the results of a TPCH scale=5 run master (f859c2ffa) vs > master + elog_ereport_attribute_cold_v4.patch > > It does not look great. The patched version seems to have done about > 1.17% less work than master did. I've marked this patch back as waiting for review. It would be good if someone could run some tests on some intel hardware and see if they can see any speedup. David
On 2020-09-29 11:26, David Rowley wrote: > On Wed, 23 Sep 2020 at 08:42, David Rowley <dgrowleyml@gmail.com> wrote: >> >> On Tue, 22 Sep 2020 at 19:08, David Rowley <dgrowleyml@gmail.com> wrote: >>> I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc >>> 9.3. I'm unable to see any gains with this, however, the results were >>> pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely >>> need to run that a bit longer. I'll do that tonight. >> >> I've attached the results of a TPCH scale=5 run master (f859c2ffa) vs >> master + elog_ereport_attribute_cold_v4.patch >> >> It does not look great. The patched version seems to have done about >> 1.17% less work than master did. > > I've marked this patch back as waiting for review. It would be good if > someone could run some tests on some intel hardware and see if they > can see any speedup. What is the way forward here? What exactly would you like to have tested? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 3 Nov 2020 at 20:08, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-09-29 11:26, David Rowley wrote: > > I've marked this patch back as waiting for review. It would be good if > > someone could run some tests on some intel hardware and see if they > > can see any speedup. > > What is the way forward here? What exactly would you like to have tested? It would be good to see some small scale bench -S tests with and without -M prepared. Also, small scale TPC-H tests would be good. I really only did testing on new AMD hardware, so some testing on intel hardware would be good. David
On 2020-11-03 21:53, David Rowley wrote: > On Tue, 3 Nov 2020 at 20:08, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> >> On 2020-09-29 11:26, David Rowley wrote: >>> I've marked this patch back as waiting for review. It would be good if >>> someone could run some tests on some intel hardware and see if they >>> can see any speedup. >> >> What is the way forward here? What exactly would you like to have tested? > > It would be good to see some small scale bench -S tests with and > without -M prepared. > > Also, small scale TPC-H tests would be good. I really only did > testing on new AMD hardware, so some testing on intel hardware would > be good. I did tests of elog_ereport_attribute_cold_v4.patch on an oldish Mac Intel laptop with pgbench scale 1 (default), and then: pgbench -S -T 60 master: tps = 8251.883229 (excluding connections establishing) patched: tps = 9556.836232 (excluding connections establishing) pgbench -S -T 60 -M prepared master: tps = 14713.821837 (excluding connections establishing) patched: tps = 16200.066185 (excluding connections establishing) So from that this seems like an easy win. I also tested on a newish Mac ARM laptop, and there the patch did not do anything, but that was because clang does not support the cold attribute, so that part works as well. ;-) -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
On Sat, 21 Nov 2020 at 03:26, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > I did tests of elog_ereport_attribute_cold_v4.patch on an oldish Mac > Intel laptop with pgbench scale 1 (default), and then: > > pgbench -S -T 60 > > master: tps = 8251.883229 (excluding connections establishing) > patched: tps = 9556.836232 (excluding connections establishing) > > pgbench -S -T 60 -M prepared > > master: tps = 14713.821837 (excluding connections establishing) > patched: tps = 16200.066185 (excluding connections establishing) > > So from that this seems like an easy win. Well, that makes it look pretty good. If we can get 10-15% on some machines without making things slower on any other machines, then that seems like a good win to me. Thanks for testing that. David
On Tue, 24 Nov 2020 at 09:36, David Rowley <dgrowleyml@gmail.com> wrote: > Well, that makes it look pretty good. If we can get 10-15% on some > machines without making things slower on any other machines, then that > seems like a good win to me. Pushed. Thank you both for reviewing this. David
On Tue, Nov 24, 2020 at 10:06 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 24 Nov 2020 at 09:36, David Rowley <dgrowleyml@gmail.com> wrote: > > Well, that makes it look pretty good. If we can get 10-15% on some > > machines without making things slower on any other machines, then that > > seems like a good win to me. > > Pushed. > > Thank you both for reviewing this. > > David > > Hmmm, unfortunately this seems to break my build ... make[1]: Entering directory `/space2/pg13/postgres/src' make -C common all make[2]: Entering directory `/space2/pg13/postgres/src/common' gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O0 -DFRONTEND -I. -I../../src/common -I../../src/include -D_GNU_SOURCE -DVAL_CC="\"gcc -std=gnu99\"" -DVAL_CPPFLAGS="\"-D_GNU_SOURCE\"" -DVAL_CFLAGS="\"-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O0\"" -DVAL_CFLAGS_SL="\"-fPIC\"" -DVAL_LDFLAGS="\"-Wl,--as-needed -Wl,-rpath,'/usr/local/pg14/lib',--enable-new-dtags\"" -DVAL_LDFLAGS_EX="\"\"" -DVAL_LDFLAGS_SL="\"\"" -DVAL_LIBS="\"-lpgcommon -lpgport -lpthread -lz -lreadline -lrt -ldl -lm \"" -c -o archive.o archive.c In file included from ../../src/include/postgres_fe.h:25:0, from archive.c:19: ../../src/include/c.h:198:49: error: missing binary operator before token "(" #if defined(__has_attribute) && __has_attribute (cold) ^ ../../src/include/c.h:204:49: error: missing binary operator before token "(" #if defined(__has_attribute) && __has_attribute (hot) ^ make[2]: *** [archive.o] Error 1 make[2]: Leaving directory `/space2/pg13/postgres/src/common' make[1]: *** [all-common-recurse] Error 2 make[1]: Leaving directory `/space2/pg13/postgres/src' make: *** [world-src-recurse] Error 2 [gregn@localhost postgres]$ gcc --version gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. I think your commit needs to be fixed based on the following documentation: https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fattribute.html#g_t_005f_005fhas_005fattribute "The first ‘#if’ test succeeds only when the operator is supported by the version of GCC (or another compiler) being used. Only when that test succeeds is it valid to use __has_attribute as a preprocessor operator. As a result, combining the two tests into a single expression as shown below would only be valid with a compiler that supports the operator but not with others that don’t. " (Thanks to my colleague Peter Smith for finding the doc explanation) Regards, Greg Nancarrow Fujitsu Australia
On Tue, 24 Nov 2020 at 12:50, Greg Nancarrow <gregn4422@gmail.com> wrote: > Hmmm, unfortunately this seems to break my build ... > I think your commit needs to be fixed based on the following documentation: > > https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fattribute.html#g_t_005f_005fhas_005fattribute Agreed. I tested on https://godbolt.org/ with a GCC version < 5 and updating to what's mentioned in the GCC manual works fine. What I had did not. Thanks for the report. I pushed a fix. David
Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path
От
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Дата:
David Rowley <dgrowleyml@gmail.com> writes: > On Tue, 24 Nov 2020 at 12:50, Greg Nancarrow <gregn4422@gmail.com> wrote: >> Hmmm, unfortunately this seems to break my build ... > >> I think your commit needs to be fixed based on the following documentation: >> >> https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fattribute.html#g_t_005f_005fhas_005fattribute > > Agreed. I tested on https://godbolt.org/ with a GCC version < 5 and > updating to what's mentioned in the GCC manual works fine. What I had > did not. > > Thanks for the report. > > I pushed a fix. The Clang documentation¹ suggest an even neater solution, which would eliminate the repetitive empty pg_attribute_foo #defines in the trailing #else/#endif block in commit 1fa22a43a56e1fe44c7bb3a3d5ef31be5bcac41d: #ifndef __has_attribute #define __has_attribute(x) 0 #endif [1] http://clang.llvm.org/docs/LanguageExtensions.html#has-attribute - ilmari -- "The surreality of the universe tends towards a maximum" -- Skud's Law "Never formulate a law or axiom that you're not prepared to live with the consequences of." -- Skud's Meta-Law
On 2020-11-24 01:52, Dagfinn Ilmari Mannsåker wrote: > The Clang documentation¹ suggest an even neater solution, which would > eliminate the repetitive empty pg_attribute_foo #defines in the trailing > #else/#endif block in commit 1fa22a43a56e1fe44c7bb3a3d5ef31be5bcac41d: > > #ifndef __has_attribute > #define __has_attribute(x) 0 > #endif Yes, this was also mentioned and agreed earlier in the thread, but then we apparently forgot to update the patch. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
David Rowley <dgrowleyml@gmail.com> writes: > Pushed. walleye's been failing since this patchset went in: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2020-11-24%2000%3A25%3A31 ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard-Wno-format-truncation -Wno-stringop-truncation -g -O2 -I../../../src/include -I./src/include/port/win32-I/c/msys/local/include -I/c/Python35/include -I/c/OpenSSL-Win64/include -I/c/msys/local/include"-I../../../src/include/port/win32" -DWIN32_STACK_RLIMIT=4194304 -DBUILDING_DLL -c -o autovacuum.oautovacuum.c C:\\Users\\BUILDE~1.SER\\AppData\\Local\\Temp\\cc4HR3xZ.s: Assembler messages: C:\\Users\\BUILDE~1.SER\\AppData\\Local\\Temp\\cc4HR3xZ.s:5900: Error: .seh_savexmm offset is negative make[3]: *** [autovacuum.o] Error 1 I have no idea what to make of that, but it looks more like a compiler bug than anything else. regards, tom lane
On Wed, 25 Nov 2020 at 04:48, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-11-24 01:52, Dagfinn Ilmari Mannsåker wrote: > > The Clang documentation¹ suggest an even neater solution, which would > > eliminate the repetitive empty pg_attribute_foo #defines in the trailing > > #else/#endif block in commit 1fa22a43a56e1fe44c7bb3a3d5ef31be5bcac41d: > > > > #ifndef __has_attribute > > #define __has_attribute(x) 0 > > #endif > > Yes, this was also mentioned and agreed earlier in the thread, but then > we apparently forgot to update the patch. I wanted to let the buildfarm settle a bit before changing this again. I plan on making the change today. (I know walleye is still not happy) David
On Wed, 25 Nov 2020 at 04:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > walleye's been failing since this patchset went in: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2020-11-24%2000%3A25%3A31 > > ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard-Wno-format-truncation -Wno-stringop-truncation -g -O2 -I../../../src/include -I./src/include/port/win32-I/c/msys/local/include -I/c/Python35/include -I/c/OpenSSL-Win64/include -I/c/msys/local/include"-I../../../src/include/port/win32" -DWIN32_STACK_RLIMIT=4194304 -DBUILDING_DLL -c -o autovacuum.oautovacuum.c > C:\\Users\\BUILDE~1.SER\\AppData\\Local\\Temp\\cc4HR3xZ.s: Assembler messages: > C:\\Users\\BUILDE~1.SER\\AppData\\Local\\Temp\\cc4HR3xZ.s:5900: Error: .seh_savexmm offset is negative > make[3]: *** [autovacuum.o] Error 1 > > I have no idea what to make of that, but it looks more like a compiler bug > than anything else. That's about the best I could come up with too when looking at that yesterday. The message gives me the impression that it might be related to code arrangement. It does seem to be the assembler that's complaining. I wondered if #if !defined(__MINGW32__) && !defined(__MINGW64__) would be the correct fix for it... aka, just define the new pg_attribute_(hot|cold) macros to empty on MinGW. David
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 25 Nov 2020 at 04:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> walleye's been failing since this patchset went in: >> I have no idea what to make of that, but it looks more like a compiler bug >> than anything else. > I wondered if #if !defined(__MINGW32__) && !defined(__MINGW64__) would > be the correct fix for it... aka, just define the new > pg_attribute_(hot|cold) macros to empty on MinGW. I'd make any such fix as narrow as possible (ie MINGW64 only, based on present evidence). It'd be nice to have a compiler version upper bound too, in the hopes that they'd fix it in future. Maybe something like "#if defined(__MINGW64__) && defined(__GNUC__) && __GNUC__ <= 8" ? regards, tom lane
On 2020-Nov-24, Tom Lane wrote: > David Rowley <dgrowleyml@gmail.com> writes: > > On Wed, 25 Nov 2020 at 04:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> walleye's been failing since this patchset went in: > >> I have no idea what to make of that, but it looks more like a compiler bug > >> than anything else. > > > I wondered if #if !defined(__MINGW32__) && !defined(__MINGW64__) would > > be the correct fix for it... aka, just define the new > > pg_attribute_(hot|cold) macros to empty on MinGW. > > I'd make any such fix as narrow as possible (ie MINGW64 only, based on > present evidence). It'd be nice to have a compiler version upper bound > too, in the hopes that they'd fix it in future. Maybe something like > "#if defined(__MINGW64__) && defined(__GNUC__) && __GNUC__ <= 8" ? Apparently the bug was fixed days after it was reported, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86048 but they haven't made a release containing the fix yet.
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2020-Nov-24, Tom Lane wrote: >> I'd make any such fix as narrow as possible (ie MINGW64 only, based on >> present evidence). It'd be nice to have a compiler version upper bound >> too, in the hopes that they'd fix it in future. Maybe something like >> "#if defined(__MINGW64__) && defined(__GNUC__) && __GNUC__ <= 8" ? > Apparently the bug was fixed days after it was reported, > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86048 > but they haven't made a release containing the fix yet. Ah, great sleuthing! So that says it occurs in 8.1 only, meaning the version test could be like #if defined(__MINGW64__) && __GNUC__ == 8 && __GNUC_MINOR__ == 1 // lobotomized code here #else ... It's not entirely clear from that bug report whether it can manifest on gcc 8.1 on other platforms; maybe we should test for x86 in general not __MINGW64__. regards, tom lane
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> On Wed, 25 Nov 2020 at 04:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> walleye's been failing since this patchset went in: >>> I have no idea what to make of that, but it looks more like a compiler bug >>> than anything else. > Apparently the bug was fixed days after it was reported, > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86048 > but they haven't made a release containing the fix yet. Wait ... the second part of that doesn't seem to be true. According to http://mingw-w64.org/doku.php/versions mingw-w64 has made at least three releases since this bug was fixed. Surely they're shipping something newer than 8.1.0 by now. So maybe, rather than hacking up the attribute stuff for a bug that might bite us again anyway in future, we ought to press walleye's owner to install a more recent compiler. regards, tom lane
On Wed, 25 Nov 2020 at 14:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So maybe, rather than hacking up the attribute stuff for > a bug that might bite us again anyway in future, we ought > to press walleye's owner to install a more recent compiler. I think that seems like a better idea. I had thoughts about installing a quick for now to give the owner of walleye a bit of time for the upgrade. From what I can tell, the latest version of minGW comes with GCC 9.2 [1] David [1] https://osdn.net/projects/mingw/releases/
On Wed, 25 Nov 2020 at 14:35, David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 25 Nov 2020 at 14:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > So maybe, rather than hacking up the attribute stuff for > > a bug that might bite us again anyway in future, we ought > > to press walleye's owner to install a more recent compiler. > > I think that seems like a better idea. I had thoughts about > installing a quick for now to give the owner of walleye a bit of time > for the upgrade. From what I can tell, the latest version of minGW > comes with GCC 9.2 [1] So, how about the attached today and I'll email Joseph about walleye and see if he can upgrade to a newer minGW version. David
Вложения
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 25 Nov 2020 at 14:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So maybe, rather than hacking up the attribute stuff for >> a bug that might bite us again anyway in future, we ought >> to press walleye's owner to install a more recent compiler. > I think that seems like a better idea. I had thoughts about > installing a quick for now to give the owner of walleye a bit of time > for the upgrade. From what I can tell, the latest version of minGW > comes with GCC 9.2 [1] mingw and mingw-w64 seem to be distinct projects with separate release schedules. The latter's webpage isn't too clear about which gcc version is in each of their releases. But they seem to be organized enough to put out releases roughly annually, so I'm supposing they aren't falling too far behind gcc upstream. regards, tom lane
David Rowley <dgrowleyml@gmail.com> writes: > So, how about the attached today and I'll email Joseph about walleye > and see if he can upgrade to a newer minGW version. WFM. (Note I already cc'd Joseph on this thread.) regards, tom lane