Hi,
While investigating other bugs I noticed that
ResolveRecoveryConflictWithLock() wasn't really working. Turns out
GetLockConflicts() violates it's API contract which says:
* The result array is palloc'd and is terminated with an invalid VXID.
Problem 1:
We don't actually put the terminator there. It happens to more or less
accidentally work on a master because the array is palloc0()ed there and
while a 0 is valid backend id it happens to not be a valid local
transaction id. In HS we don't actually allocate the array every time,
but it's instead statically allocated. Without zeroing.
I have no idea why this doesn't crash
ResolveRecoveryConflictWithInterrupt() which does:while (!lock_acquired){ if (++num_attempts < 3) backends =
GetLockConflicts(locktag,AccessExclusiveLock);
... ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_LOCK);
and ResolveRecoveryConflictWithVirtualXIDs does:
...while (VirtualTransactionIdIsValid(*waitlist)){ /* kill kill kill */
/* The virtual transaction is gone now, wait for the next one */ waitlist++;}
I guess we just accidentally happen to come across appropriately
set memory at some point.
I'm really baffled that this hasn't caused significant problems so far.
Problem 2:
Since bcd8528f001 and 29eedd312274 the "the result array is palloc'd" is
wrong because we're now doing:
static VirtualTransactionId *vxids;/* * Allocate memory to store results, and fill with InvalidVXID. We only * need
enoughspace for MaxBackends + a terminator, since prepared xacts * don't count. InHotStandby allocate once in
TopMemoryContext.*/if (InHotStandby){ if (vxids == NULL) vxids = (VirtualTransactionId *)
MemoryContextAlloc(TopMemoryContext, sizeof(VirtualTransactionId) * (MaxBackends + 1));}else
vxids= (VirtualTransactionId *) palloc0(sizeof(VirtualTransactionId) * (MaxBackends + 1));
Obviously that violates the API contract. I'm inclined to rip the HS
special case out and add a pfree() to the single HS caller. The commit
message introducing it says:
Use malloc() in GetLockConflicts() when called InHotStandby to avoid repeated palloc calls. Current code assumed
thiswas already true, so this is a bug fix.
But I can't really believe this is relevant.
Greetings,
Andres Freund
--Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services