Re: sinval synchronization considered harmful
От | Tom Lane |
---|---|
Тема | Re: sinval synchronization considered harmful |
Дата | |
Msg-id | 16876.1311714315@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: sinval synchronization considered harmful (Noah Misch <noah@2ndQuadrant.com>) |
Ответы |
Re: sinval synchronization considered harmful
|
Список | pgsql-hackers |
Noah Misch <noah@2ndQuadrant.com> writes: > On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote: >> After some further reflection I believe this patch actually is pretty >> safe, although Noah's explanation of why seems a bit confused. > Here's the way it can fail: > 1. Backend enters SIGetDataEntries() with main memory bearing stateP->resetState > = false, stateP->nextMsgNum = 500, segP->maxMsgNum = 505. The CPU has those > latest stateP values in cache, but segP->maxMsgNum is *not* in cache. > 2. Backend stalls for <long time>. Meanwhile, other backends issue > MSGNUMWRAPAROUND - 5 invalidation messages. Main memory bears > stateP->resetState = true, stateP->nextMsgNum = 500 - MSGNUMWRAPAROUND, > segP->maxMsgNum = 500. > 3. Backend wakes up, uses its cached stateP values and reads segP->maxMsgNum = > 500 from main memory. The patch's test finds no need to reset or process > invalidation messages. [ squint... ] Hmm, you're right. The case where this would break things is if (some of) the five unprocessed messages relate to some object we've just locked. But the initial state you describe would be valid right after obtaining such a lock. > That's the theoretical risk I wished to illustrate. Though this appears > possible on an abstract x86_64 system, I think it's unrealistic to suppose that > a dirty cache line could persist *throughout* the issuance of more than 10^9 > invalidation messages on a concrete implementation. Dirty cache line, maybe not, but what if the assembly code commands the CPU to load those variables into CPU registers before doing the comparison? If they're loaded with maxMsgNum coming in last (or at least after resetState), I think you can have the problem without any assumptions about cache line behavior at all. You just need the process to lose the CPU at the right time. If we marked the pointers volatile, we could probably ensure that the assembly code tests resetState last, but that's not sufficient to avoid the stale-cache-line risk. regards, tom lane
В списке pgsql-hackers по дате отправления: