Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
От | Maxim Orlov |
---|---|
Тема | Re: Parallel scan with SubTransGetTopmostTransaction assert coredump |
Дата | |
Msg-id | 150825360832da68872abb4d891dfc3a@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Parallel scan with SubTransGetTopmostTransaction assert coredump (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
|
Список | pgsql-hackers |
On 2021-07-09 20:36, Tomas Vondra wrote: > Hi, > > I took a quick look on this - I'm no expert in the details of > snapshots, > so take my comments with a grain of salt. > > AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think > Greg is right the v3 patch does not seem like the right (or correct) > solution, for a couple reasons: > > > 1) It fixes the symptoms, not the cause. If we set TransactionXmin to a > bogus value, this only fixes it locally in > SubTransGetTopmostTransaction > but I'd be surprised if other places did not have the same issue. > > > 2) The xid/TransactionXmin comparison in the v2 fix: > > xid = xid > TransactionXmin ? xid : TransactionXmin; > > seems broken, because it compares the XIDs directly, but that's not > going to work after generating enough transactions. > > > 3) But even if this uses TransactionIdFollowsOrEquals, it seems very > strange because the "xid" comes from > > XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot)) > > i.e. it's the raw xmin from the tuple, so why should we be setting it > to > TransactionXmin? That seems pretty strange, TBH. > > > So yeah, I think this is due to confusion with two snapshots and > failing > to consider both of them when calculating TransactionXmin. > > But I think removing one of the snapshots (as the v2 does it) is rather > strange too. I very much doubt having both the transaction and active > snapshots in the parallel worker is not intentional, and Pavel may very > well be right this breaks isolation levels that use the xact snapshot > (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though. > > So I think we need to keep both snapshots, but fix TransactionXmin to > consider both of them - I suppose ParallelWorkerMain could simply look > at the two snapshots, and use the minimum. Which is what [1] (already > linked by Pavel) talks about, although that's related to concerns about > one of the processes dying, which is not what's happening here. > > > I'm wondering what consequences this may have on production systems, > though. We've only seen this failing because of the assert, so what > happens when the build has asserts disabled? > > Looking at SubTransGetTopmostTransaction, it seems the while loop ends > immediately (because it's pretty much the opposite of the assert), so > we > just return the "xid" as topmost XID. But will that produce incorrect > query results, or what? > > > > regards > > > [1] > https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de PFA v4 patch based on the ideas above. In principle I see little difference with v3. But I agree it is more correct. I did test this patch on Linux and MacOS using testing algo above and got no error. On master branch before the patch I still see this error. --- Best regards, Maxim Orlov.
Вложения
В списке pgsql-hackers по дате отправления: