Race condition in HEAD, possibly due to PGPROC splitup
От | Tom Lane |
---|---|
Тема | Race condition in HEAD, possibly due to PGPROC splitup |
Дата | |
Msg-id | 27187.1323836130@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Race condition in HEAD, possibly due to PGPROC splitup
|
Список | pgsql-hackers |
If you add this Assert to lock.c: diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 3ba4671..d9c15e0 100644 *** a/src/backend/storage/lmgr/lock.c --- b/src/backend/storage/lmgr/lock.c *************** GetRunningTransactionLocks(int *nlocks) *** 3195,3200 **** --- 3195,3202 ---- accessExclusiveLocks[index].dbOid = lock->tag.locktag_field1; accessExclusiveLocks[index].relOid= lock->tag.locktag_field2; + Assert(TransactionIdIsNormal(accessExclusiveLocks[index].xid)); + index++; } } then set wal_level = hot_standby, and run the regression tests repeatedly, the Assert will trigger eventually --- for me, it happens within a dozen or so parallel iterations, or rather longer if I run the tests serial style. Stack trace is unsurprising, since AFAIK this is only called in the checkpointer: #2 0x000000000073461d in ExceptionalCondition ( conditionName=<value optimized out>, errorType=<value optimized out>, fileName=<value optimized out>, lineNumber=<value optimized out>) at assert.c:57 #3 0x000000000065eca1 in GetRunningTransactionLocks (nlocks=0x7fffa997de8c) at lock.c:3198 #4 0x00000000006582b8 in LogStandbySnapshot (nextXid=0x7fffa997dee0) at standby.c:835 #5 0x00000000004b0b97 in CreateCheckPoint (flags=32) at xlog.c:7761 #6 0x000000000062bf92 in CheckpointerMain () at checkpointer.c:488 #7 0x00000000004cf465 in AuxiliaryProcessMain (argc=2, argv=0x7fffa997e110) at bootstrap.c:424 #8 0x00000000006261f5 in StartChildProcess (type=CheckpointerProcess) at postmaster.c:4487 The actual value of the bogus xid (which was pulled from allPgXact[proc->pgprocno]->xid just above here) is zero. What I believe is happening is that somebody is clearing his pgxact->xid entry asynchronously to GetRunningTransactionLocks, and since that clearly oughta be impossible, something is broken. Without the added assert, you'd only notice this if you were running a standby slave --- the zero xid results in an assert failure in WAL replay on the slave end, which is how I found out about this to start with. But since we've not heard reports of such before, I suspect that this is a recently introduced bug; and personally I'd bet money that it was the PGXACT patch that broke it. I have other things to do than look into this right now myself. regards, tom lane
В списке pgsql-hackers по дате отправления: