RE: [HACKERS] Sure enough, SI buffer overrun is broken
От | Hiroshi Inoue |
---|---|
Тема | RE: [HACKERS] Sure enough, SI buffer overrun is broken |
Дата | |
Msg-id | NDBBIJLOILGIKBGDINDFKEIECCAA.Inoue@tpf.co.jp обсуждение исходный текст |
Ответ на | Re: [HACKERS] Sure enough, SI buffer overrun is broken (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> I built the current sources with MAXNUMMESSAGES set to 32 in > >> src/include/storage/sinvaladt.h. The regular regress tests > >> run OK, with just a few NOTICEs about 'cache state reset' > >> and 'SI buffer overflow' inserted in the normal outputs > >> (as you'd expect, if SI overrun occurs). > >> > >> However, the parallel tests crash spectacularly, with weird errors > >> and Assert() coredumps. > > With these two changes in place, the parallel regress tests seem much > more stable. There is still a big problem though, which is that the > "portals" regress test sometimes fails. I traced this far enough to > discover that the code is trying to use a TupleDesc that it's stored in > the ScanTupleSlot of a plan, and this TupleDesc is no longer valid --- > presumably the relcache entry it was gotten from has been flushed and > rebuilt, leaving the plan with a dangling TupleDesc pointer. Ugh. > Oh great,I've also doubted relcache entry rebuild but wasn't able to trace so far. > I do not think it is very practical to try to change all of the places > that assume that they can save pointers to the TupleDesc of a relcache > entry. Instead I am thinking about solving the problem inside the > relcache, as follows: > > During a relcache entry rebuild, do not simply free and reconstruct > the TupleDesc. Instead, read the catalogs to build a new TupleDesc, > and compare this one to the old one. If they are the same, free the > new one instead (leaving the old one in place, and hence stored pointers > to it are still valid). If they are not the same, then elog(ERROR). > > (elog may sound overly paranoid, but this condition indicates that the > table's definition has actually changed since we grabbed the refcount on > it --- remember we wouldn't be doing this at all if the relcache entry had > zero refcount --- and therefore all kinds of derived information such as > plans may be wrong. Pressing ahead will probably lead to crash.) > Sounds reasonable. > We may need to do the same for any other substructures of the relcache > entry that are visible from outside relcache.c. > > I know this sounds pretty grotty, but we are already doing it for the > relcache entry itself --- rebuild re-uses the same physical entry > rather than deleting and reallocating it, to ensure that pointers > to an open Relation stay valid over an SI flush. We need to extend > the same guarantee to the substructure of the relcache entry. > > Unless someone has a better idea, I'll work on that. > Agreed. Regards. Hiroshi Inoue Inoue@tpf.co.jp
В списке pgsql-hackers по дате отправления: