Poorly thought out code in vacuum
От | Tom Lane |
---|---|
Тема | Poorly thought out code in vacuum |
Дата | |
Msg-id | 25422.1325810273@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Poorly thought out code in vacuum
Re: Poorly thought out code in vacuum |
Список | pgsql-hackers |
Lately I have noticed buildfarm member jaguar occasionally failing regression tests with ERROR: invalid memory alloc request size 1073741824 during a VACUUM, as for example at http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2012-01-04%2023%3A05%3A02 Naturally I supposed it to be a consequence of the CLOBBER_CACHE_ALWAYS option, ie, somebody accessing an already-freed cache entry. I managed to duplicate and debug it here after an afternoon of trying, and it's not actually related to that at all, except perhaps to the extent that CLOBBER_CACHE_ALWAYS slows down some things enormously more than others. The failure comes when a ResourceOwner tries to enlarge its pinned-buffers array to more than 1GB, and that's not a typo: there are umpteen entries for the same buffer, whose PrivateRefCount is 134217727 and trying to go higher. A stack trace soon revealed the culprit, which is this change in lazy_vacuum_heap(): @@ -932,7 +965,8 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]); buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk,RBM_NORMAL, vac_strategy); - LockBufferForCleanup(buf); + if (!ConditionalLockBufferForCleanup(buf)) + continue; tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats); /* Now that we'vecompacted the page, record its available space */ introduced in http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=bbb6e559c4ea0fb4c346beda76736451dc24eb4e The "continue" just results in another iteration of the containing tight loop, and of course that immediately re-pins the same buffer without having un-pinned it. So if someone else sits on their pin of the buffer for long enough, kaboom. We could fix the direct symptom by inserting UnlockReleaseBuffer() in front of the continue, but AFAICS that just makes this into a busy-waiting equivalent of waiting unconditionally, so I don't really see why we should not revert the above fragment altogether. However, I suppose Robert had something more intelligent in mind than a tight loop when the buffer can't be exclusively locked, so maybe there is some other change that should be made here instead. regards, tom lane
В списке pgsql-hackers по дате отправления: