Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.
От | Chris Hanks |
---|---|
Тема | Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock. |
Дата | |
Msg-id | CAK7KUdD8ibDSnZY-65TEZMgvU6A79Znkq990hAUbREJgheqReg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock. (Chris Hanks <christopher.m.hanks@gmail.com>) |
Список | pgsql-bugs |
Hey, I realize it's a bit late, but did anybody ever find a root cause for this? I removed the relevant index and haven't had an issue since. Thanks, Chris On Thu, Oct 26, 2017 at 7:42 PM, Chris Hanks <christopher.m.hanks@gmail.com> wrote: > I wound up working around the issue by forking the database and > removing that GIN index, and things have been fine with the new > instance for the past two days. > > I previously had two Postgres instances with hung processes, one 9.6.1 > and one 9.6.5. For work reasons I destroyed the 9.6.5 (Heroku support > was having issues with its container anyway) but I've kept the 9.6.1. > I'm happy to give permission for you to access it, just let me know > who to talk to, or have them reach out to me. I have an open issue in > their tracker under my work email, chris@pathgather.com. > > Thanks again! > > On Thu, Oct 26, 2017 at 7:14 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> I managed to get a couple of hours to look at this this afternoon. >> >> On Mon, Oct 23, 2017 at 10:39 AM, Chris Hanks >> <christopher.m.hanks@gmail.com> wrote: >>> I'm not sure if you guys had noticed this already, but just in case, >>> those two hung connections weren't making the inserts at exactly the >>> same time - if you look at pg_stat_activity they executed about a day >>> apart (since Heroku cycles the clients every 24 hours or so). And >>> before I restarted the first DB that experienced the problem, there >>> were ~30 backends built up over the course of a month. It seems like >>> when one INSERT sticks, every following INSERT just stacks up on top >>> of it, trying to take out the same lock. >> >> Right. In both backtraces, we see that we're an inserter stuck on >> getting an exclusive buffer lock on the buffer containing block 0, the >> meta page block (GIN_METAPAGE_BLKNO). There is probably some >> session/backend that has acquired two buffer locks in an order that is >> inconsistent with these inserters, meaning that you get an >> undetectable deadlock. (The only alternative to that theory is that >> some backend sits on a meta page buffer lock for some other reason, >> but that seems much less likely.) >> >> The interesting question to my mind is: What backend is the other >> backend that acquires buffer locks in an incompatible order, resulting >> in this undetectable deadlock? What's it doing? (I worked for Heroku >> until quite recently; I may be able to open a back channel, with >> Chris' permission.) >> >> I remember expressing befuddlement about commit e95680832854c over a >> year ago, and never satisfying myself that it was correct [1]. I'm no >> expert on GIN, so I dropped it. It feels like that might be relevant >> here, since you seem to be using this GIN index with a queue table. >> That usage pattern is one where entire leaf pages in indexes tend to >> be routinely deleted and later recycled by VACUUM, at least with >> B-Trees [2]. Whereas, in general I think B-Tree (and presumably GIN) >> page deletion is fairly rare, since the entire page must be empty for >> it to happen. >> >> The follow up bugfix commit, e2c79e14, added a ConditionalLockPage() >> to the insert ginInsertCleanup() path, while also adding a LockPage() >> to the VACUUM path. In case you missed it, those are *heavyweight* >> page lock acquisitions, not buffer lock acquisitions, which is pretty >> unconventional (I though only hash still did that). Frankly, the >> e2c79e14 fix seems kind of bolted on (though I don't want to focus on >> that aspect right now). >> >> [1] https://postgr.es/m/CAM3SWZSDxqDBvUGOoNm0veVOwgJV3GDvoncYr6f5L16qo8MYRg@mail.gmail.com >> [2] https://brandur.org/postgres-queues >> -- >> Peter Geoghegan
В списке pgsql-bugs по дате отправления: