Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
От | Andres Freund |
---|---|
Тема | Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed |
Дата | |
Msg-id | 20190406171025.x7mbhp6kct75oqny@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed |
Список | pgsql-bugs |
Hi, On 2019-04-06 09:28:46 -0700, Andres Freund wrote: > On 2019-04-06 12:23:06 -0400, Tom Lane wrote: > > It seems that there may be some connection between this problem and > > EPQ. I was working on committing Amit's fix for bug #15677, which > > demonstrated that EPQ doesn't work for partitioned-table target rels. > > It seemed like there really needed to be regression test coverage for > > that, so I tried to convert his crasher example into an isolation test. > > It does indeed crash without Amit's fix ... but with it, lookee what > > I get: > > > > +error in steps c1 complexpartupdate: ERROR: unexpected table_lock_tuple status: 1 > > > > That seems fully reproducible in this test. I haven't looked into > > exactly what's causing that, but now that we have a reproducible > > example, somebody should. > > > > I'm not quite sure if I should commit this as-is or wait till the > > other problem is fixed. A crash is probably worse than a bogus > > error, but I don't like committing obviously-wrong "expected" output. > > Thoughts? > > Let me have a look at the testcase - I'd been running Roman's testcase > for quite a few hours without being able to reproduce. But your testcase > seems to trigger this reliably, so I hope I can make some quick > progress. Hm. I see what's wrong here - the new code assumed that we couldn't get a SelfModified because the first version of the to-be-(deleted|updated) tuple was visible. To properly discern that from the TM_Deleted case, I'd to change/fix heapam_lock_tuple's follow-the-update chain to return SelfModified, rather than Invisible in this case (I don't think we want to allow invisible - we'd have to have waited for the earlier tuple version) - which is a more accurate return code anyway. I'm still not understanding how that'd be possible in Roman's case. Given the workload there never should be any self updating going on? Heavily-WIP patch attached. I noticed that we say + ereport(ERROR, + (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION), + errmsg("tuple to be updated was already modified by an operation triggered by thecurrent command"), in the ExecDelete() case (that's not new). Which seems odd. I think my fix would need a non-partition reproducer. I'll work on that and polishing it after having a coffee. Greetings, Andres Freund
Вложения
В списке pgsql-bugs по дате отправления: