Another nasty cache problem
От | Tom Lane |
---|---|
Тема | Another nasty cache problem |
Дата | |
Msg-id | 22885.949246873@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: [HACKERS] Another nasty cache problem
Re: [HACKERS] Another nasty cache problem Re: [HACKERS] Another nasty cache problem |
Список | pgsql-hackers |
I'm down to the point where the parallel tests mostly work with a small SI buffer --- but they do still sometimes fail. I've realized that there is a whole class of bugs along the following lines: There are plenty of routines that do two or more SearchSysCacheTuple calls to get the information they need. As the code stands, it is unsafe to continue accessing the tuple returned by SearchSysCacheTuple after making a second such call, because the second call could possibly cause an SI cache reset message to be processed, thereby flushing the contents of the caches. heap_open and CommandCounterIncrement are other routines that could cause cache entries to be dropped. This is a very insidious kind of bug because the probability of occurrence is very low (at normal SI buffer size a reset is unlikely, and even if it happens, you won't observe a failure unless the pfree'd tuple is actually overwritten before you're done with it). So we cannot hope to catch these things by testing. I am not sure what to do about it. One solution path is to make all the potential trouble spots do SearchSysCacheTupleCopy and then pfree the copied tuple when done. However, that adds a nontrivial amount of overhead, and it'd be awfully easy to miss some trouble spots or to introduce new ones in the future. Another possibility is to introduce some sort of notion of a reference count, and to make the standard usage pattern betuple = SearchSysCacheTuple(...);... use tuple ...ReleaseSysCacheTuple(tuple); The idea here is that a tuple with positive refcount would not be deleted during a cache reset, but would simply be removed from its cache, and then finally deleted when released (or during elog recovery). This might allow us to get rid of SearchSysCacheTupleCopy, too, since the refcount should be just as good as palloc'ing one's own copy for most purposes. I haven't looked at the callers of SearchSysCacheTuple to see whether this would be a practical change to make. I was wondering if anyone had any comments or better ideas... regards, tom lane
В списке pgsql-hackers по дате отправления: