Обсуждение: Curious run-away index build on upgrade to 8.1.3
Hello list; We stumbled on a maintenance_work_mem related problem with index builds. Recently imported a good sized DB into a new 8.1.3 install on platform SunOS 5.9. The DB consists of some 400 tables and about twice as many indexes. The Upgrade went very smoothly until we hit an index build over a table with approx 1.5 million records. The index field was type bigint. We have a great many other tables of similar composition which had no problem on index creation. What happened was that for a couple minutes the CPU load would steadily increase and disk activity decrease at the same time. Before long, one CPU is 100% busy and we let this continue for 2 hours, a 100x longer than this index usually takes to build. Disk IO dropped to nothing and remained so. Worse is that the backend that was spinning would not respond to a cancel nor SIGTERM. Stopping this activity required a -m immediate shutdown of Postgres. maintenance_work_mem parameter was traced down to be the problem. It's initial setting was the same that we've been using on 8.0 with apparent success. And it would seem as if the maintenance_work_mem value could be set a lot higher. Nonetheless, I found the tipping point to be somewhere between 32 and 45k of the parameter setting. A bit of info on the table; 1.5 million records, 1650 distinct values in index column, Here's a mod 50 walk through the distribution of how many records match each of the index field values; select b from foo where a % 50 = 0 order by b; b ------ 1 2 2 4 6 10 15 21 30 37 44 56 68 84 99 117 149 190 228 261 298 340 423 486 593 717 853 1077 1302 1596 2470 3385 5739 (33 rows) Any comments on this and/or request for additional diagnostics welcome. If it would be of interest to someone that I truss one of the spinning processes, I can redo this in an R&D setting and submit the results. Thank you! -- ------------------------------------------------------------------------------- Jerry Sievers 305 854-3001 (home) WWW ECommerce Consultant 305 321-1144 (mobile http://www.JerrySievers.com/
Jerry Sievers <jerry@jerrysievers.com> writes: > What happened was that for a couple minutes the CPU load would > steadily increase and disk activity decrease at the same time. Before > long, one CPU is 100% busy and we let this continue for 2 hours, a > 100x longer than this index usually takes to build. Disk IO dropped > to nothing and remained so. Hm, we were just doing some work in this area a week or two ago, and 8.2 should be materially faster than current releases ... but offhand I don't know why 8.1 would be any worse than earlier versions. Looking in the CVS history shows that the sort logic didn't change at all between 8.0 and 8.1. Are you sure index build on this index really performs differently than it did in 8.0? What platform is this on? > Worse is that the backend that was spinning would not respond to a > cancel nor SIGTERM. Stopping this activity required a -m immediate > shutdown of Postgres. Yeah, we also noticed last week that some of the major loops in btree index build were free of any CHECK_FOR_INTERRUPTS calls :-(. This is fixed for 8.1.4. > If it would be of interest to someone that I truss one of the spinning > processes, I can redo this in an R&D setting and submit the results. truss probably wouldn't show anything interesting; a gprof or oprofile profile could be useful though. regards, tom lane
Hi Tom, thanks for responding on this. More below; Tom Lane <tgl@sss.pgh.pa.us> writes: > Jerry Sievers <jerry@jerrysievers.com> writes: > > What happened was that for a couple minutes the CPU load would > > steadily increase and disk activity decrease at the same time. Before > > long, one CPU is 100% busy and we let this continue for 2 hours, a > > 100x longer than this index usually takes to build. Disk IO dropped > > to nothing and remained so. > > Hm, we were just doing some work in this area a week or two ago, and > 8.2 should be materially faster than current releases ... but offhand > I don't know why 8.1 would be any worse than earlier versions. Looking > in the CVS history shows that the sort logic didn't change at all > between 8.0 and 8.1. Are you sure index build on this index really > performs differently than it did in 8.0? What platform is this on? Ok, I just did re-check this on an 8.0.3 install and got proper results. That is; the index builds in about 1 minute. Platform is Solaris 2.9 FWIW, there is at least one difference in compile time options and that is; on our 8.1.3 installs, we are now using 'enable-thread-safety' as Slony asks for this. Not sure if this may be related to the problem though. These are on machines with a Gig of RAM. On 8.1.3 the index will not build with maintenance_work_mem set to our v8.0 setting of 64k. It will however build with it at 32k and even builds fine with the default value for maintenance_work_mem whcih which is 16k for this machine. Were the index to now build in say 10 minutes of actual work, we'd be looking at a performance problem. On the contrary, the failure mode is a complete run-away process doing nothing productive. > > > Worse is that the backend that was spinning would not respond to a > > cancel nor SIGTERM. Stopping this activity required a -m immediate > > shutdown of Postgres. > > Yeah, we also noticed last week that some of the major loops in btree > index build were free of any CHECK_FOR_INTERRUPTS calls :-(. This is > fixed for 8.1.4. Great! The problem we saw suggests the backend being in an endless loop of some kind. > > If it would be of interest to someone that I truss one of the spinning > > processes, I can redo this in an R&D setting and submit the results. > > truss probably wouldn't show anything interesting; a gprof or oprofile > profile could be useful though. Ok I did a bit of reading on gprof and note where it says the program has to call exit in order for the profiling data file to be written. My limited experience with the problem here is that we had to stop it with "Sig Niner". Tried standard cancel and also OS level SIGTERM, to no avail. Any idea if there was a signal that this might have responded to shy of -9? Great big thanks. > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- ------------------------------------------------------------------------------- Jerry Sievers 305 854-3001 (home) WWW ECommerce Consultant 305 321-1144 (mobile http://www.JerrySievers.com/
Jerry Sievers <jerry@jerrysievers.com> writes: > Any idea if there was a signal that this might have responded to shy > of -9? SIGQUIT might work. Also, I'd be interested to take a look if you can send me a dump of just the index key column data. (COPY with a column list can do that.) regards, tom lane
Jerry Sievers <jerry@jerrysievers.com> writes: > Platform is Solaris 2.9 I think this might be the key... I can reproduce an unreasonably long runtime if I use src/port/qsort.c (as we do on Solaris), but not with glibc's version of qsort. I think you are seeing the poor-choice-of-qsort-pivots problem recently discussed on -hackers: http://archives.postgresql.org/pgsql-hackers/2006-02/msg00590.php The fact that the data contains a very large fraction of nulls probably contributes to the problem (thanks for sending it BTW). The reason you only see it with maintenance_work_mem >= 64M is that below that, the problem doesn't fit into work_mem and so we don't use qsort at all. What I still don't understand though is why you see it in 8.1 and not 8.0 ... the src/port/qsort.c code didn't change at all between those versions. Maybe 8.0 isn't taking the qsort code path, perhaps because it uses a shade more memory or some such. Could you try increasing maintenance_work_mem even more, like to 100M, and see if 8.0 gets slow? regards, tom lane
Hi Tom; Tom Lane <tgl@sss.pgh.pa.us> writes: > Jerry Sievers <jerry@jerrysievers.com> writes: > > Platform is Solaris 2.9 > > I think this might be the key... > > I can reproduce an unreasonably long runtime if I use src/port/qsort.c > (as we do on Solaris), but not with glibc's version of qsort. I think > you are seeing the poor-choice-of-qsort-pivots problem recently > discussed on -hackers: > http://archives.postgresql.org/pgsql-hackers/2006-02/msg00590.php > The fact that the data contains a very large fraction of nulls > probably contributes to the problem (thanks for sending it BTW). > The reason you only see it with maintenance_work_mem >= 64M is that > below that, the problem doesn't fit into work_mem and so we don't use > qsort at all. What I still don't understand though is why you see it > in 8.1 and not 8.0 ... the src/port/qsort.c code didn't change at all > between those versions. Maybe 8.0 isn't taking the qsort code path, > perhaps because it uses a shade more memory or some such. Could you > try increasing maintenance_work_mem even more, like to 100M, > and see if 8.0 gets slow? I did as you suggest here; taking the MWM setting incrementally up to nearly a Gig in 100M increments, also raised and lowered the work_mem setting too. Tried a dozen or more combos and in every case, the index built quickly on 8.0.3. Very interesting! Thanks for all your help. -- ------------------------------------------------------------------------------- Jerry Sievers 305 854-3001 (home) WWW ECommerce Consultant 305 321-1144 (mobile http://www.JerrySievers.com/
Jerry Sievers <jerry@jerrysievers.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> What I still don't understand though is why you see it >> in 8.1 and not 8.0 ... the src/port/qsort.c code didn't change at all >> between those versions. Maybe 8.0 isn't taking the qsort code path, >> perhaps because it uses a shade more memory or some such. Could you >> try increasing maintenance_work_mem even more, like to 100M, >> and see if 8.0 gets slow? > I did as you suggest here; taking the MWM setting incrementally up to > nearly a Gig in 100M increments, also raised and lowered the work_mem > setting too. > Tried a dozen or more combos and in every case, the index built > quickly on 8.0.3. On further analysis, it seems the problem is dependent on the exact ordering of the inputs to the qsort function. So not only do you need maintenance_work_mem to be large enough that the code will try to use qsort, but the physical order of the rows in the table matters. I suspect that you are testing on an 8.0 table with a different physical row order --- if you drop the table and reload it from the same dump you loaded into 8.1, does it get slow? regards, tom lane
Well, I'll be dipped! More below; Tom Lane <tgl@sss.pgh.pa.us> writes: > On further analysis, it seems the problem is dependent on the exact > ordering of the inputs to the qsort function. So not only do you need > maintenance_work_mem to be large enough that the code will try to use > qsort, but the physical order of the rows in the table matters. > I suspect that you are testing on an 8.0 table with a different physical > row order --- if you drop the table and reload it from the same dump you > loaded into 8.1, does it get slow? Tom, your theory is correct. If we had to build that index from scratch on 8.0, with the data ordered as it was when I made the original dump, it would have stalled all the same. I just reproduced this on an 8.0.3 box. FYI -- ------------------------------------------------------------------------------- Jerry Sievers 305 854-3001 (home) WWW ECommerce Consultant 305 321-1144 (mobile http://www.JerrySievers.com/