Re: [HACKERS] New regression driver
От | Tom Lane |
---|---|
Тема | Re: [HACKERS] New regression driver |
Дата | |
Msg-id | 7165.943143009@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] New regression driver (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] New regression driver
|
Список | pgsql-hackers |
Tom Lane <tgl@sss.pgh.pa.us> writes: > wieck@debis.com (Jan Wieck) writes: >> It is in utils/cache/catcache.c line 996. The comments say >> that the code should prevent the backend from entering >> infinite recursion while loading new cache entries. > I will look at this. I don't think that the catcaches live in > shared memory, so the problem is probably not what you suggest. > The fact that the behavior is different under load may point to a > real problem, not just an insufficiently clever debugging check. Indeed, this is a real bug, and commenting out the code that caught it is not the right fix! What is happening is that utils/inval.c is trying to initialize some variables that contain OIDs of system relations. This means calling the catcache routines in order to look up relation names in pg_class. However, if a shared cache inval message arrives from another backend while that's happening, we recursively invoke inval.c to deal with the message. And inval.c sees that its OID variables aren't initialized yet, so it recursively calls the catcache routines to try to get them initialized. Or, if just the first one's been initialized so far, ValidateHacks() assumes they're all valid, and you can end up at the elog(FATAL) panic at the bottom of CacheIdInvalidate(). I've got a core dump which contains a ten-deep recursion between inval.c and syscache.c, culminating in elog(FATAL) because the eleventh incoming sinval message was just slow enough to let inval.c's first OID variable get filled in before it arrived. In short: we don't deal very robustly with cache invals happening during backend startup. Send invals at a new backend with just the right timing, and it'll choke. I am not sure if this bug is of long standing or if we introduced it since 6.5. It's possible I created it while messing with the relcache stuff a month or two ago. But I can easily believe that it's been there a long time and we never had a way of reproducing the problem with any reliability before. I think the fix is to rip out inval.c's attempt to look up system relation names, and just give it hardwired knowledge of their OIDs. Even though it sort-of works to do the lookups, it's bad practice for routines that are potentially called during catcache initialization to depend on the catcache to be already working. And there are other places that already have hardwired knowledge of the system relation OIDs, so... regards, tom lane
В списке pgsql-hackers по дате отправления: