Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.

Поиск
Список
Период
Сортировка
От Chris Hanks
Тема Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.
Дата
Msg-id CAK7KUdD8ibDSnZY-65TEZMgvU6A79Znkq990hAUbREJgheqReg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.  (Chris Hanks <christopher.m.hanks@gmail.com>)
Список pgsql-bugs
Hey, I realize it's a bit late, but did anybody ever find a root cause
for this? I removed the relevant index and haven't had an issue since.

Thanks,
Chris

On Thu, Oct 26, 2017 at 7:42 PM, Chris Hanks
<christopher.m.hanks@gmail.com> wrote:
> I wound up working around the issue by forking the database and
> removing that GIN index, and things have been fine with the new
> instance for the past two days.
>
> I previously had two Postgres instances with hung processes, one 9.6.1
> and one 9.6.5. For work reasons I destroyed the 9.6.5 (Heroku support
> was having issues with its container anyway) but I've kept the 9.6.1.
> I'm happy to give permission for you to access it, just let me know
> who to talk to, or have them reach out to me. I have an open issue in
> their tracker under my work email, chris@pathgather.com.
>
> Thanks again!
>
> On Thu, Oct 26, 2017 at 7:14 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> I managed to get a couple of hours to look at this this afternoon.
>>
>> On Mon, Oct 23, 2017 at 10:39 AM, Chris Hanks
>> <christopher.m.hanks@gmail.com> wrote:
>>> I'm not sure if you guys had noticed this already, but just in case,
>>> those two hung connections weren't making the inserts at exactly the
>>> same time - if you look at pg_stat_activity they executed about a day
>>> apart (since Heroku cycles the clients every 24 hours or so). And
>>> before I restarted the first DB that experienced the problem, there
>>> were ~30 backends built up over the course of a month. It seems like
>>> when one INSERT sticks, every following INSERT just stacks up on top
>>> of it, trying to take out the same lock.
>>
>> Right. In both backtraces, we see that we're an inserter stuck on
>> getting an exclusive buffer lock on the buffer containing block 0, the
>> meta page block (GIN_METAPAGE_BLKNO). There is probably some
>> session/backend that has acquired two buffer locks in an order that is
>> inconsistent with these inserters, meaning that you get an
>> undetectable deadlock. (The only alternative to that theory is that
>> some backend sits on a meta page buffer lock for some other reason,
>> but that seems much less likely.)
>>
>> The interesting question to my mind is: What backend is the other
>> backend that acquires buffer locks in an incompatible order, resulting
>> in this undetectable deadlock? What's it doing? (I worked for Heroku
>> until quite recently; I may be able to open a back channel, with
>> Chris' permission.)
>>
>> I remember expressing befuddlement about commit e95680832854c over a
>> year ago, and never satisfying myself that it was correct [1]. I'm no
>> expert on GIN, so I dropped it. It feels like that might be relevant
>> here, since you seem to be using this GIN index with a queue table.
>> That usage pattern is one where entire leaf pages in indexes tend to
>> be routinely deleted and later recycled by VACUUM, at least with
>> B-Trees [2]. Whereas, in general I think B-Tree (and presumably GIN)
>> page deletion is fairly rare, since the entire page must be empty for
>> it to happen.
>>
>> The follow up bugfix commit, e2c79e14, added a ConditionalLockPage()
>> to the insert ginInsertCleanup() path, while also adding a LockPage()
>> to the VACUUM path. In case you missed it, those are *heavyweight*
>> page lock acquisitions, not buffer lock acquisitions, which is pretty
>> unconventional (I though only hash still did that). Frankly, the
>> e2c79e14 fix seems kind of bolted on (though I don't want to focus on
>> that aspect right now).
>>
>> [1] https://postgr.es/m/CAM3SWZSDxqDBvUGOoNm0veVOwgJV3GDvoncYr6f5L16qo8MYRg@mail.gmail.com
>> [2] https://brandur.org/postgres-queues
>> --
>> Peter Geoghegan


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #15114: logical decoding Segmentation fault
Следующее
От: Петър Славов
Дата:
Сообщение: Re: BUG #15114: logical decoding Segmentation fault