RE: dsa_allocate() faliure
От | Arne Roland |
---|---|
Тема | RE: dsa_allocate() faliure |
Дата | |
Msg-id | d9c6cc80e21241349db53b2f64075029@index.de обсуждение исходный текст |
Ответ на | Re: dsa_allocate() faliure (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: dsa_allocate() faliure
|
Список | pgsql-performance |
It's definitely a quite a relatively complex pattern. The query I set you last time was minimal with respect to predicates(so removing any single one of the predicates converted that one into a working query). > Huh. Ok well that's a lot more frequent that I thought. Is it always the same query? Any chance you can get the plan? Are there more things going on on the server, like perhaps concurrent parallel queries? I had this bug occurring while I was the only one working on the server. I checked there was just one transaction with asnapshot at all and it was a autovacuum busy with a totally unrelated relation my colleague was working on. The bug is indeed behaving like a ghost. One child relation needed a few new rows to test a particular application a colleague of mine was working on. The inserttriggered an autoanalyze and the explain changed slightly: Besides row and cost estimates the change is that the line Recheck Cond: (((COALESCE((fid)::bigint, fallback) ) >= 1) AND ((COALESCE((fid)::bigint, fallback) ) <= 1) AND (gid && '{853078,853080,853082}'::integer[])) is now Recheck Cond: ((gid && '{853078,853080,853082}'::integer[]) AND ((COALESCE((fid)::bigint, fallback) ) >= 1) AND ((COALESCE((fid)::bigint,fallback) ) <= 1)) and the error vanished. I could try to hunt down another query by assembling seemingly random queries. I don't see a very clear pattern from thequeries aborting with this error on our production servers. I'm not surprised that bug is had to chase on production servers.They usually are quite lively. >If you're able to run a throwaway copy of your production database on another system that you don't have to worry aboutcrashing, you could just replace ERROR with PANIC and run a high-speed loop of the query that crashed in product, orsomething. This might at least tell us whether it's reach that condition via something dereferencing a dsa_pointer orsomething manipulating the segment lists while allocating/freeing. I could take a backup and restore the relevant tables on a throwaway system. You are just suggesting to replace line 728 elog(FATAL, "dsa_allocate could not find %zu free pages", npages); by elog(PANIC, "dsa_allocate could not find %zu free pages", npages); correct? Just for my understanding: why would the shutdown of the whole instance create more helpful logging? All the best Arne
В списке pgsql-performance по дате отправления: