Re: FSM Corruption (was: Could not read block at end of the relation)
От | Noah Misch |
---|---|
Тема | Re: FSM Corruption (was: Could not read block at end of the relation) |
Дата | |
Msg-id | 20240303234715.4d@rfd.leadboat.com обсуждение исходный текст |
Ответ на | FSM Corruption (was: Could not read block at end of the relation) (Ronan Dunklau <ronan.dunklau@aiven.io>) |
Ответы |
Re: FSM Corruption (was: Could not read block at end of the relation)
|
Список | pgsql-bugs |
On Tue, Feb 27, 2024 at 11:34:14AM +0100, Ronan Dunklau wrote: > - happens during heavy system load > - lots of concurrent writes happening on a table > - often (but haven't been able to confirm it is necessary), a vacuum is running > on the table at the same time the error is triggered > > Then, several backends get the same error at once "ERROR: could not read > block XXXX in file "base/XXXX/XXXX": read only 0 of 8192 bytes", with different What are some of the specific block numbers reported? > has anybody witnessed something similar ? https://postgr.es/m/flat/CA%2BhUKGK%2B5DOmLaBp3Z7C4S-Yv6yoROvr1UncjH2S1ZbPT8D%2BZg%40mail.gmail.com reminded me of this. Did you upgrade your OS recently? On Fri, Mar 01, 2024 at 09:56:51AM +0100, Ronan Dunklau wrote: > I think I may have missed something on my first look. On other affected > clusters, the FSM is definitely corrupted. So it looks like we have an FSM > corruption bug on our hands. What corruption signs did you observe in the FSM? Since FSM is intentionally not WAL-logged, corruption is normal, but corruption causing errors is not normal. That said, if any crash leaves a state that the freespace/README "self-correcting measures" don't detect, errors may happen. Did the clusters crash recently? > The occurence of this bug happening makes it hard to reproduce, but it's > definitely frequent enough we witnessed it on a dozen PostgreSQL clusters. You could do "ALTER TABLE x SET (vacuum_truncate = off);" and see if the problem stops happening. That would corroborate the VACUUM theory. Can you use backtrace_functions to get a stack track? > In our case, we need to repair the FSM. The instructions on the wiki do work, > but maybe we should add something like the attached patch (modeled after the > same feature in pg_visibility) to make it possible to repair the FSM > corruption online. What do you think about it ? That's reasonable in concept.
В списке pgsql-bugs по дате отправления: