Re: [BUGS] BUG #5412: test case produced, possible race condition.
От | Tom Lane |
---|---|
Тема | Re: [BUGS] BUG #5412: test case produced, possible race condition. |
Дата | |
Msg-id | 8731.1271269904@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: [BUGS] BUG #5412: test case produced, possible
race condition.
Re: [BUGS] BUG #5412: test case produced, possible race condition. |
Список | pgsql-hackers |
I wrote: > [ theory about cause of Rusty's crash ] I started to doubt this theory after wondering why the problem hadn't been exposed by CLOBBER_CACHE_ALWAYS testing, which is done routinely by the buildfarm. That setting would surely cause the cache flush to happen at the troublesome time. After a good deal more investigation, I found out why it doesn't crash with that. The problematic case is for a relation that has rd_newRelfilenodeSubid nonzero but rd_createSubid zero (ie, it's been truncated in the current xact). Given that, RelationFlushRelation will attempt a rebuild but RelationCacheInvalidate won't exempt the relation from destruction. However, if you do a TRUNCATE under CLOBBER_CACHE_ALWAYS, the relcache entry gets blown away immediately at the conclusion of that command, because we'll do a RelationCacheInvalidate as a consequence of CLOBBER_CACHE_ALWAYS. When the relcache entry is rebuilt for later use, it won't have rd_newRelfilenodeSubid set, so it's not a hazard anymore. In order to expose this bug, the relcache entry has to survive past the TRUNCATE and then a cache flush has to occur while we are in process of rebuilding it, not before. What this suggests is that CLOBBER_CACHE_ALWAYS is actually too strong to provide a thorough test of cache flush hazards. Maybe we need an alternate setting along the lines of CLOBBER_CACHE_SOMETIMES that would randomly choose whether or not to flush at any given opportunity. But if such a setup did produce a crash, it'd be awfully hard to reproduce for investigation. Ideas? There is another slightly odd thing here, which is that the stack trace Rusty provided clearly shows the crash occurring during processing of a local relcache invalidation message for the truncated relation. This would be expected during execution of the TRUNCATE itself, but at that point the rel has positive refcnt so there's no problem. According to the stack trace the active SQL command is an INSERT ... SELECT, and I wouldn't expect that to queue any relcache invals. Are there any triggers or other unusual things in the real application (not the watered-down test case) that would be triggered in INSERT/SELECT? regards, tom lane
В списке pgsql-hackers по дате отправления: